DEV Community: Ovid

Rewriting the Monolith, Part 2

Ovid — Wed, 19 May 2021 07:03:36 +0000

How To Rewrite The Monolith

In part 1 of this article, we gave an overview of some (very) basic architectural concepts that you will need to know for the actual rewrite. Now we'll explain how to make that rewrite successful. Again, this isn't the only way to make this work, but it's a solid approach that works better than just sitting down and saying "we'll just rewrite everything from scratch."

OpenAPI

We need to understand OpenAPI. OpenAPI, formerly known as Swagger, is a specification for defining your REST API. With that, you can generate clients and/or servers which conform to that API. To save time, you can use the OpenAPI generator to generate a working client or a server stub (you have to supply the actual logic for the server, of course) for a wide variety of languages.

Furthermore, if you design your OpenAPI document well, you can get free documentation for it!

OpenAPI is truly a powerful tool. What makes it a good choice for rewrites is that it helps you create a black box with an extremely well-defined interface. We'll need that.

Here's what part of an OpenAPI document might look like. This is YAML, but you can use JSON if you prefer.

paths:
  /product/category/{category}:
    get:
      summary: Returns a list of products for a given category.
      responses:
        '200':
          description: A JSON array of product objects
          content:
            application/json:
              schema: 
                type: array
                items: 
                  type: object

In the above snippet, we have routes like GET /product/category/some_category. The your server is expected to pass some_category to the endpoint which handles that route and for a 200 OK response, you must return an array of objects.

Of course, you can return other responses and define their response structure, too, allowing the client to know how to automatically handle 404s and 500s, rather than simply guessing.

Using OpenAPI to rewrite

If you look at the diagram about layering, you're going to think of your horizontal layers (search, shopping cart, checkout, etc.) as potential services you can expose to OpenAPI. If you've never done anything like this before, you'll need to tread slowly.

You're going to take one of those horizontal layers and convert it to OpenAPI. Which one is up to you, but to start with, I'd recommend the smallest, simplest layer you can find. This will make the OpenAPI specification easier to write and will give you more experience in how to make this work.

First, you identify all of the core functionality that this layer needs to expose. For search, maybe it's just search by category and search by name. So you'll need at least two endpoints that are exposed.

Build out your OpenAPI spec, build out your server stubs, and then fill them in, writing tests against the API along the way.

Next, you start slowly disentangling the search layer from the rest of the code, often like slowly picking apart a knot. Depending on how well your system is designed, this could be quick and easy, or slow and painful. Over time, this gets easier with other layers due to previous layers being removed, and having more experience doing this.

As part of this disentangling, you have identified everywhere in your code that requires those searches. One-by-one, start having them call your OpenAPI routes and not call the search code directly.

When done, you've started on the path to creating a SOA (Service-Oriented Architecture). Before we go on, let's take a look at where we are.

First, more code can conceivably use the service because it's much less coupled with the existing code. Second, you might find you have an actual product! If your service is interesting enough, you could potentially offer that to the outside world. Imagine selling something that hobbyists enjoy and now their web sites, with an API key and a little bit of Javascript, could embed your search on their site.

Heck, you could even write the Javascript for them and publish it on your site to make it easier for others to include your content on their sites.

Peeling off the layer

Now that we've gotten a good start on creating a service, we need to make it more robust. Specifically, you want to separate out this code so that it can be deployed separately and removed from the original repository. The details of that are very much up to you and your team. However, when it can be deployed separately, you could even deploy it on other servers in order to lighten the load on the main app servers. In fact, that's one of the nice benefits of SOA: decoupling components so you can manage them separately.

As part of this "peeling off", you also want to write even more tests against the OpenAPI endpoints, with particular emphasis on capturing corner cases that might not be obvious. What's important is that you try to avoid writing tests that depend on the particular programming language you use. Instead, treat the service as a black box and adopt a data-driven approach. You have a series of tests in, say, a YAML document, and each is a request/response cycle. The beginning of the document might look like this:

---
tests:
  - name: Fetch widgets category
    request:
      body: |
          GET /products/category/widgets HTTP/1.1
          Accept-Encoding: gzip
          User-Agent: Mojolicious (Perl)
          Content-Length: 0
          X-API-Key: 7777777
          Host: localhost:7777
          Content-Type: application/json
    response:
      body: |
        HTTP/1.1 200 OK
        Accept-Ranges: none
        Vary: accept-encoding
        Date: Thu, 25 Mar 2021 15:56:48 GMT
        Content-Length: 88
        Content-Type: application/json; charset=utf-8
        Connection: keep-alive

        [{...}]

Why a data-driven approach? First, it discourages "cheating" by using a white-box approach. The second deals with the rewrite phase.

The actual rewrite

You've peeled off a horizontal layer. It can be deployed separately. You run your data-driven, black-box tests against it. You use Redoc or something similar to have full documentation automatically written for it. Now you hand it off to the other team for a rewrite into the target language.

First, they use the OpenAPI generator to generate full server stubs for their language of choice. Now, they take the documentation and your tests and just fill in each server stub. They keep going until all tests pass. At that point, they can deploy and your sysadmins can switch switch the URL/port in your API gateway and away you go! If it fails, they switch it back and you figure out what when wrong.

Your test were data-driven to enforce the black box approach, but also to allow them to easily port the tests to their target language, if they so desire, and extend them.

While this all seems very hand-wavy at the end, the reality is that most of heavy lifting has been done at this point. It's taken a lot to get here, but it's a heck of a lot less than rewriting the entire application from scratch.

Once you've successfully pulled the first layer out into its own OpenAPI service, you keep doing this with successive layers, handing them off to the rewrite teams.

Why This Works

The "never rewrite" dogma got locked in hard after Joel Spolsky's article about it. And frankly, he makes very good points, but he was talking about monolithic applications. By deconstructing them into SOA, you can do small chunks at a time, reducing the risk considerably. And with the OpenAPI approach outlined above, you have great documentation and tests to give your rewrite developers a fighting chance, something that is often lacking in traditional rewrite scenarios.

Obstacles

It wouldn't be fair to say it's all rainbows and unicorns. It's often not easy to extract a horizontal layer from a monolithic application.

You also have to worry about the quality of the tests being written. It's amazingly easy to skimp on tests and only write a few to demonstrate the basics. It's even easier when you are writing tests for code that's already working. The new team won't have working code, so the tests might not be up to snuff.

And if Perl is either the source or the destination language, you'll be disappointed to learn that the OpenAPI Generator doesn't have code to handle server stubs for Perl (it can generate a client, though). There's Mojolicious::Plugin::OpenAPI which can help, but not only does it require weaving in non-standard bits like x-mojo-to and x-mojo-name into your OpenAPI doc, but you have to wire all of the bits together yourself. We've done this and frankly, it's not fun. Further, some of our clients have very old codebases that run on older versions of Perl than Mojolicious supports, thus rendering it useless for them.

To address that, our company has started to develop Net::OpenAPI. It's framework agnostic and while it does rely on JSON::Validator, which in turns relies on Mojolicious, we almost have a full JSON::Validator58 working, which should support older versions of Perl.

We still have a lot more work to do on this, but so far it will build out full server stubs, with a PSGI file, and working documentation using Redoc. You just need to fill in the endpoints:

endpoint 'get /pet/{petId}' => sub {
    my ( $request, $params ) = @_;

    # replace this code with your code
    return Net::OpenAPI::App::Response->new(
        status_code => HTTPNotImplemented,
        body        => {
            error => 'Not Implemented',
            code  => HTTPNotImplemented,
            info  => 'get /pet/{petId}',
        },
    );
};

It's open source, and free software, so if you'd like to contribute, pull requests welcome!

Digressions

As a final note, there are a couple of points we need to touch on.

SOA vs Microservices

What's the difference? Are we developing microservices here, or heading towards an SOA?

The easy answers is that microservices are small and self-contained applications, while the services in SOA are larger, more "enterprise-ready" tools. But that doesn't really help.

Microservices are generally expected to have all the logic and data they need to perform a task. They're not coordinating with a bunch of other services and thus are loosely coupled and fit well with agile development.

SOA services, however, need to coordinate with other services in the same way that your Order object needs a Collection of Item objects, along with a Customer object, and so on. In the layers diagram above, all three of the layers will need to know about items, but they'll frequently be treated as data instead of instantiated objects which are passed around.

Duplicate Code

But won't this result in a lot of duplicated code?

For microservices, if they're completely stand-alone but you have several which need to deal with the same business logic, then yes, you could easily have duplicated code and data. This IBM article sums up the dilemma nicely:

In SOA, reuse of integrations is the primary goal, and at an enterprise level, striving for some level of reuse is essential.

In microservices architecture, creating a microservices component that is reused at runtime throughout an application results in dependencies that reduce agility and resilience. Microservices components generally prefer to reuse code by copy and accept data duplication to help improve decoupling.

So when striving for a SOA, because different services coordinate, you can strive for the "single responsibility principle" and avoid duplication. However, if you're rewriting, you probably have a legacy mess. That means technical debt. That means hard decisions. You may have to copy code to get started.

Full or partial rewrite?

One of the benefits of this approach is that you can choose to go for a partial rewrite. Just as you might throw away your custom search engine code in favor of ElasticSearch or Typesense, so might you find that a CPU-intensive part of your application can be safely rewritten in Go or Rust. You don't have to make this an all or nothing scenario. You can also take the time to do it right, pausing as needed, rather than the death march scenario of the full rewrite.

TL;DR

The "never rewrite" mantra needs to stop. Instead, it should be "almost never rewrite", but if you must, do so incrementally. We've laid out a clean approach, but be warned: it's clean on paper. If you're backed into the rewrite corner, it's not easy getting out. But you can get out so long as you choose an incremental approach that's more likely to succeed—and bring plenty of patience and courage with you.

Cover Photo by Francesco Paggiaro from Pexels

Rewriting the Monolith, Part 1

Ovid — Wed, 19 May 2021 07:03:15 +0000

Oh no! You have to rewrite that huge legacy app! You already argued that you can fix the legacy code and you've sent the managers the Joel Spolsky "never rewrite" article. Management responds by pointing out that both Basecamp and Visual Studio both had famously successful rewrites.

Do you update your CV or do you tuck in and get the job done?

Sometimes the rewrite is where you're at. Either you've made the decision or you were given the order. Maybe it's because your product is written in VB 6, or UniBasic, or Easytrieve (all of which I've programmed in) and you can't find developers. Or maybe the software is a slow, CPU-bound memory hog and Python or Perl just aren't great choices there.

Whatever your reasons, you need a clean strategy. A problem this hard can't be covered in a couple of paragraphs, but by the time we get to the end, I'll give you a fighting chance of not only making that rewrite succeed, but also avoiding exchanging one plate of spaghetti code for another. This isn't the only way to approach this problem, but I've seen this work and as an added bonus, you'll be developing a SOA (service-oriented architecture). If you're unsure why you should do that, read Steve Yegge's epic rant on the topic.

We start by giving a quick refresher on structured programming, along with vertical and horizontal layering of applications. If you're comfortable with that, you can jump straight to Part 2.

Structured Programming

I learned to program in the early 80s, first with BASIC before moving on to 6809 assembler. Being self-taught, "structured programming" wasn't a concept I was familiar with and neither BASIC nor assembly prepared me for that. But moving on to C and later learning Warnier/Orr diagramming (pdf) taught me the basics of structured programming. We all learn this after a while. This module handles payments while that module handles orders and this is how you use subroutines, and so on.

Unfortunately, for some programmers, that's where their design experience stops. They're not concerned with separation of concerns. They see no problem that huge subroutine that concatenates a bunch of strings, based on conditionals, to form SQL, and then returns an HTML table that will be concatenated with other HTML snippets to try to make a web page.

Vertical Layering

Eventually we start learning about the vertical layers that our application might have. For example, in a classic MVC pattern, you might have a view layer, which accepts JSON from a controller and renders it as HTML. All kept very separate from the main logic. The controller merely dispatches requests from the view to the business model layer and that layer gets its data from a lower-level data layer (often a database).

The layers have distinct boundaries around their responsibilities. Ignoring this approach makes it much harder to manage different parts of an application. For example, I frequently see people using ORMs and then embedding business logic in them (I've made this mistake more than once). Thus, when they try to change that logic, or change the data layer, you're often working on code with two sets of responsibilities and it gets hard to untangle them if needed.

A key point to layering is to remember that, for vertical layering, each layer can only talk to adjacent layers. The view can talk to a controller, but never the data layer. Keeping vertical layers separate helps to minimize spaghetti code.

For an example of how bad layering can make your life miserable, you can read my Project 500 case study.

Horizontal Layering

Horizontal layering is less well-known, but it can be a great tool.

Imagine you have a ShoppingCart class for a simple e-commerce web site. Is that useful? Not by itself. You probably need plenty of other classes to make it useful. You might need a Customer object, and Product objects, and Currency objects, and all sorts of other things (assuming you're going OO).

So let's step back and ask ourselves what kinds of things that site might need. This example is obviously not a huge system, but we're keeping things simple:

Product search
Shopping cart
Payment

For many monolithic sites, those are often all lumped together in one big code base, with their edges overlapping. But if you start to think of those as services, you could look at those as horizontal layers.

You might think that you only want horizontal layers to talk to adjacent layers, but this time it's different. You have a variety of different services and they often need to share domain knowledge across different services. For example, if you also have a blog "service", when someone is searching for a product, you might show related blog entries. When reading the blog, you might be able to add an item directly to the shopping cart for that.

What's important is that your layers are separated cleanly, with each preferably being a black box. If you have a monolithic, legacy codebase, there's a good chance layers aren't separated cleanly (or at at all). But you need to at least understand the concepts.

In part 2, we'll cover the actual process of rewriting.

Cover Photo by Francesco Paggiaro from Pexels

Fixing Legacy Code

Ovid — Sat, 08 May 2021 10:04:07 +0000

If you've been in this business long enough, sooner or later you're going to be faced with a terrible problem: fixing the legacy codebase. What follows isn't the only way to proceed, but it's a tried-and-true strategy that unfortunately doesn't appear to be well-known. The core of what follows is risk minimization. Assuming you're facing the problem of fixing a legacy application, you already have risk and you don't need more added to it. What follows is lower risk and lower cost than rewriting the system from scratch.

If you like what you read here and you need help, especially with Perl, get in touch with me and see how our company can help out.

Why You (Probably) Don't Rewrite The Code

Before we start, there are a few things you should know. First, read this now-famous Joel Spolsky article about why you should never rewrite your code (trust me, read it, but don't forget to come back). In that article, Spolsky makes a strong case about why you should refactor your codebase instead of rewriting it. Refactoring, if you're not familiar with the term, is the process of making a series of gradual improvements to code quality without changing the behavior. When you're trying to fix code, trying to change its structure and behavior at the same time is begging for trouble.

That being said, I don't believe in the word "never". If your code is written in UniBasic, rewriting might be your only option since you can't find developers who know the language (or are willing to learn it). Heck, I used to program in UniBasic and I've forgotten the language entirely.

Or if you're working with a relatively small piece of software with low impact, rewriting may not be that dangerous.

But let's say you can find or train developers for the language your software is written in, the software is mission-critical, and it's a very large codebase. Rewriting begins to make much less sense. Refactoring it means that you always have working code, you're not throwing away business knowledge or obscure bug-fixes, and your developers aren't starting from scratch, hoping they can make something work. In other words, you're minimizing your risk.

That being said, many companies (and developers) still opt for the rewrite. New code is exciting. New code promises new opportunities. New code is fun but fixing old code is often seen as drudgery. However, if you have a large, legacy codebase, the new code you're writing is, by definition, a large project and large projects are very high risk (emphasis mine):

In a landmark 1995 study, the Standish Group established that only about 17% of IT projects could be considered "fully successful," another 52% were "challenged" (they didn't meet budget, quality or time goals) and 30% were "impaired or failed." In a recent update of that study conducted for ComputerWorld, Standish examined 3,555 IT projects between 2003 and 2012 that had labor costs of at least $10 million and found that only 6.4% of [IT projects] were successful.

That's an old study, but there's still plenty of newer work which bears this out. The larger the project, the larger the risk. In fact, of the large projects I have been involved with for various companies, few were both on time and on budget. Some were cancelled outright and still others dragged on, long after it was clear they were a disaster, simply because no one wanted to take the blame for failure. One of them was approaching its fourth year of a one-year schedule and was riddled with bugs and design flaws, but the company made the new software backwards-incompatible, switched over their clients and now have no way out. The only reason the company is still in business is that they bought another company that is very profitable and is paying for the company's mistake.

That last examples alludes to a dirty little secret that's often not talked about in our industry: large-scale rewrites often exchange one pile of spaghetti code for another. Rather than truly solve the underlying problems, the companies have traded a known set of problems for an unknown set of problems. If you need to fix your legacy code it's because you need to minimize your risk; why on earth would you knowingly adopt unquantifiable risk?

How to Refactor Your Legacy Code

Assuming you've decided that you don't want to face the cost and risk of a large-scale rewrite, how do you refactor your code?

First, you need to assess where you are. At a bare minimum:

What are the functional requirements of the code? (very high-level here)
What documentation exists, if any?
What areas of the code are more fragile? (Check your bug tracker)
What external resources does it require?
What tests exist, if any?

All of these things need to be written down so that anyone can consult this information at a glance. This information represents the bare necessities for the expert you're going to hire to fix the mess.

If the above list seems simplistic, that's because we're refactoring, not rewriting.

And yes, you're probably going to hire an outside expert. Not only will they see things that you didn't, but while your current developers may be good, if they can't clearly lay down a solid plan to fix the legacy codebase while simultaneously minimizing risk, you need to bring in someone with experience with this area. What follows is not always intuitive and the expert's experience will help you navigate the rough waters you're already in. At a minimum, your expert needs to have the following:

Expert in the primary language(s) of your codebase
A strong automated testing background
Very comfortable with code coverage tools
A strong database background (probably)
An expert in system design/architecture
Ability to admit when they're wrong
Understanding of business needs
A persuasive personality

The last points seems strange, but hard choices will need to be made and there will be strong disagreements about how to make them.

It's hard to find this mix in top-notch developers, but it will definitely pay off.

Getting Started

The first thing you'll want to do is get a rough idea of how you want your new application laid out. Call this your architecture roadmap, but keep in mind that your landscape will change over time and this roadmap should be flexible. This is where your expert's architecture skills will come in. Various functional parts of your application will be decoupled and put into separate areas to ensure that each part of your application has a "specialty" that it focuses on. When each part of your application has one area it focuses on,
it's easier to maintain, extend, and reuse, and that's primarily why we want to fix our legacy codebase. However, don't make detailed plans at this time; no battle plan survives first contact with the enemy.

Instead, just ensure that you have a rough sense of where you're going to go.

Next, you're going to refactor your application the same way you eat an elephant: one bite (byte?) at a time. You'll pick a small initial target to get familiar with your new tools. Over time, it will get easier, but you don't want to bite off too big a chunk when you get started.

Refactoring a large application means writing tests, but unless you know what you're doing, you're probably going to get it wrong. There's often little TDD here — the code is already written — and you can't write tests for everything — you'll never finish. Instead, you'll be tactically applying integration tests piece by piece.

The first thing you need to do is understand what won't change in your application. By "won't change" I mean whatever it is that uses your application's output, whether it be through a JSON API, a Web site, a SOAP interface or what have you. Since something has to use the software, that something is what is going to make everything work.

You're going to be writing integration tests against whatever that something is. For the sake of argument, we'll assume we're refactoring a Web application. You've decided that you'll start by writing tests to verify that you can list users on your admin page.

Inside those tests, you'll create a browser object, log in as an admin user, fetch the users page and write tests to assert that the expected users show up on that page. Just getting to this point can often take a huge amount of work. For example, how do you get code to connect to a test database? How do you ensure data isolation between tests (in other words, the order in which tests are run should not matter)? Heck, how do you create that browser object (hint: Selenium is a good choice here)? These and many more questions need to be answered when you're first starting out.

Getting to to this point may be easy if you already have some tests, or it may be very hard if you don't, but it's the important first step in the refactoring.

Once you have that first integration test targeting a small and (relatively) unchanging part of your interface, run your code coverage tools over the test(s) to see what code is covered with these high-level integration tests. Code which is covered is code which is generally safe to refactor (there are plenty of exceptions, but that's another article entirely).

Now you can start looking at which functional parts of the application are embedded in that tested code and make a plan for moving those sections into your architecture roadmap. At this point, it's tempting to rip everything apart, but don't give in to that temptation. Instead, focus on one piece at a time. For example, if you have SQL scattered throughout the code, start pulling that out into your architecture roadmap so that you have a clean API to work with the data you need. Or perhaps you have a Web application and you have been printing the HTML directly: look at using a templating system and start pulling the HTML out into templates. Don't fix everything at once or you'll be trying to do too much. Instead, focus on one area of responsibility and understand it well.

Don't Do Unit Testing (Yet)

Note that we've been talking about integration testing but not unit testing. There's a very good reason for that: with heavy refactoring of a legacy system, your units will change quite heavily when you first start, but the integration tests focusing on the rather static interfaces will not. You want to spend time refactoring your application, not your tests, so until you've stabilized how the code works internally, unit tests can actually be a distraction.

Integration testing has the advantage that you can cover (if not actually test) huge portions of your code at once and if done correctly, can be very fast to write. Further, with poorly structured applications, unit testing may be very difficult, if not impossible.

Integration testing will also help uncover bugs that unit testing cannot: bugs where different components have different expectations when talking to one another. However, there are some downsides to integration testing:

Integration tests run slower than unit tests
Bugs are harder to track down
It's easier to break real things if you've not isolated your code well enough

That being said, the advantage of integration testing at this stage is clear: refactoring is much easier when you have some basic tests to protect against the worst errors. It's also worth keeping in mind that if you've done little to no testing before this, you're not significantly worse off if you have some solid tests than if you have none. Don't obsess too much on this point: you don't want perfect to be the enemy of the good.

If you haven't already implemented a continuous integration (CI) system, this is the time to start. Even if your developers forget to run the tests, your CI system shouldn't. You want to find out fast if tests are failing.

Pushing Forward

After you've started refactoring one functional piece of a small part of your system, you'll probably quickly uncover some bad assumptions made in the original plan. That's OK. You've started small to minimize your risk. Correct those bad assumptions and then start integration tests with code coverage for another small part of your system, pulling out the functional portions (database calls, HTML, or whatever) that you've already been working on. When you feel comfortable that you've shaken out some of the worst issues, start looking at another functional bit of the system that your currently tested code shares and see if you can pull that out.

Note that this is where your expert's architectural skills are going to shine. They'll understand the importance of decoupling different functional portions of the application. They'll understand how to write robust, flexible interfaces. They'll learn to recognize patterns in your business logic which can be abstracted out. Do not hand this responsibility over to an existing programmer unless you are absolutely confident they have the skills and experience necessary to get this done.

At this point, what follows is a wash/rinse/repeat cycle which in worst case scenarios can take years to finish. It takes a long time, but it has some significant advantages:

The code is always working
You're not paying to maintain two systems at the same time
Business knowledge is not lost
New features can still be added
Tests can now be easily written to target existing bugs (even if you don't refactor that code yet)
You can always stop if you've made your codebase "good enough"

Why does this approach work? Any large project can seem daunting, but by breaking it down into smaller, manageable pieces, you can at least know where to start and get a sense of where you are going without the nail-biting worry about whether or not a huge project is going to fail.

When I've used this technique before, I've often found that it's a pleasure to finally have a cleaner sense of how the code is evolving and the current experienced team doesn't face the demoralizing prospect of watching their jobs disappear. The downside of this technique is that while code quality improves tremendously, there's always a feeling that it's not good enough. However, as I previously alluded to, many rewritten systems merely create new design flaws to replace old ones. This is far too common of a problem and it means swapping known problems for unknown ones.

For more information, you can watch this presentation I've given on the topic:

Conclusion

The above strategy isn't appealing to many people and it can be a hard sell to those who are convinced that newer is better. In fact, in many respects it can be viewed as boring (though I love refactoring code), but I've successfully used this approach on multiple legacy codebases. However, if you're still trying to decide between a rewrite and a refactor, keep in mind that this approach is a relatively low-cost, low-risk approach. If it proves unworkable, you've likely risked very little. If the rewrite proves unworkable, you could cost the company a ton of money.

So the final question to ask yourself is when you should consider fixing your legacy codebase. The only advice I can offer is to suggest that you not wait until the storm before you fix your roof. Fixing a legacy code base isn't rocket science, but it does require a degree of expert knowledge in how to transform an existing codebase. Sadly, it's not a skill most developers seem interested in acquiring, but then, most don't seem interested in working on legacy codebases in the first place.

In a follow-up post, I'll explain a safe approach if an rewrite cannot be avoided.

Don‘t Estimate Project Costs

Ovid — Wed, 28 Apr 2021 08:05:20 +0000

I know, I know. Management posts are boring, especially on a dev site. But many of you will eventually head down that road, or at least have to talk to a manager at some point. And when you do, the question of “how much will this cost to build?” will eventually come up. But that’s the wrong question.

Possibly you’ve not heard of Douglas Hubbard. If you’re interested at all in estimating the value of IT projects, his works should be required reading. In fact, his books are required reading for the Society of Actuaries if you want to pass their exams and become an actuary.

But we’ll come back to him in a moment.

If there’s one key thing I’ve learned in consulting, it’s that potential clients who start discussions by talking about building value for their customers have usually been better clients than those who start conversations with “what’s your rate?” Why? Because the “what’s your rate” clients are focused on controlling costs, not building value. I’ve found that their approach problems is different and those who are paying more attention to minimizing risk than maximizing value simply have different ideas about how to approach a project.

And you know what? I get it. If you want to go out to dinner and you only have £15 in your pocket, you’re probably not going to a fancy restaurant. You just might grab a pint at a pub and hit a kebab shop after. Budget constraints are real.

But curiously, while development costs on a project are one of the first things we look at, they’re also one of the worst metrics you can use to judge the value of a project. It’s part of a problem that Hubbard refers to as the IT Measurement Inversion. Specifically:

The variables having the greatest impact on project success are the variables we measure the least.

So what sort of variables are there? There are the ongoing development and maintenance costs when the project is delivered. There are marketing costs. There are costs to have your people learn the new system. And so on. The costs are myriad and complex, but if we’re going to judge the value of a new system, which variables matter the most? Those are the ones we should be focusing on.

Hubbard’s work will guide you here. In particular, you might want to consider his book How to Measure Anything (that’s not an affiliate link, by the way), where he teaches you how to break problems down to things that can be measured and how to measure them. You’ll learn quite a bit about statistical analysis and running Monte Carlo simulations. In fact, one of the things you’ll learn is that of the various techniques which have been developed to estimate project success, Monte Carlo simulations win hands down.

In fact, Monte Carlo simulations are so powerful that Hubbard talks about research at NASA showing that, for over 100 projects, the accountants outperformed the subject matter experts (scientists and engineers) on project estimates because the accountants used Monte Carlo simulations rather than other methodologies.

But for the purposes of our article, the key takeaway is that Hubbard has done a lot of this work for you and has nailed down what your real risks are. And it’s almost intuitive once you think of it. Let’s start with the most important.

The greatest risk to project success is whether or not it will be canceled.

That seems so brain-dead obvious that it’s almost insulting to say, but when was the last time you quantified that risk? When was the last time you looked at the potential value of a large project and said “there’s a 25% chance that this project will be canceled” and then factored that into your risk estimates? It’s the single most important variable, but most people ignore it! Can you imagine running a company and discovering your managers are repeatedly ignoring the most valuable information at their disposal?

The next most important variable is system utilization. This is actually complex and includes how fast your product will roll out, whether people will use it, their rate of adoption, and so on. Again, this seems so obvious given 20/20 hindsight. Sure, your project took six months to develop instead of three, but customers will be using it for many years. When it’s put that way, it’s obvious that estimating system utilization is more valuable than estimating development costs. But people still ignore it.

So let’s look at development costs.

Of course you have to factor in development costs; I’d be insane to suggest otherwise (despite the provocative title of this post). You have a budget and when you’re looking at backing your project with Oracle or PostgreSQL, most of the time you find that Oracle isn’t a terribly cost-effective solution. So yes, costs are important, but consider this: if the first version of Amazon’s code cost twice as much to develop, Jeff Bezos would probably still be insanely rich. Uber would still be dominating their space. Facebook would still be selling your pesonal data to dodgy companies. I can confidently say that because these companies are providing real value to people (you might disagree about Facebook, but I can offer you 2.32 billion rebuttals). The fact that these projects weren’t canceled and are being used far, far outweighs the initial project costs. Or to restate all of this:

Initial development costs are incurred once. Your revenue stream is continuous. That’s why development costs are less important.

“But Ovid, I don’t know how to do Monte Carlo simulations or estimate projected system utilization!”

Well ... that’s actually a fair point. These articles are intended, if nothing else, to be practical.

Since the two most importance variables in project success are the odds of it being canceled and system utilization, we need to reduce the risk of each of those. How do we do that?

Simply put, you go lean/agile. Figure out the smallest possible work you can do which would add value and build that. Instead of a year-long project, you have a crazy one month race to get a simple prototype in front of your customers. If it turns out your customers have zero interest in your “Recycled Food” project, it’s better to find out after a month instead of a year.

But let’s say there’s some interest. What now? Well, you’ve greatly reduced the chance of project cancelation because it looks viable, so you need to focus on system utilization. Amongst the key factors, getting customers to actually use the damned thing is incredibly important. You will be amazed at the ideas they come up with and the feedback they give you. As you push forward, you address customer concerns and have a lower risk of building features on top of things they say aren’t working. You’re constantly pushing new versions of the product in front of your customers and iterating based on their feedback. Every step of the way, you’re improving your project based on customer behavior, not your hunches or market research.

This, incidentally, is why Lean Startups are such a powerful movement in business today. They assume they’re not canceling their entire business, so they need to maximize system utilization. That means rolling the product out quickly and immediately responding to customer feedback. By doing this, assuming your product isn’t canceled, you’ve now reduced the single greatest risk your project faces.

Stop stop worrying about how much the project costs to develop. Worry about building a great product.

And if you'd like to learn more about this way of looking at problems, hit a search engine and search for noestimates.

If you have any thoughts about this, please leave a comment below.

Cover Photo by Nataliya Vaitkevich from Pexels

Use Immutable Objects

Ovid — Tue, 20 Apr 2021 06:54:25 +0000

Immutable Objects

I’ve been spending time designing Corinna, a new object system to be shipped with the Perl language. Amongst its many features, it’s designed to make it easier to create immutable objects, but not everyone is happy with that. For example, consider the following class:

class Box {
    has ($height, $width, $depth) :reader :new;
    has $volume :reader = $width * $height * $depth;
}

my $original_box = Box->new(height=>1, width=>2, depth=>3);
my $updated_box  = $original_box->clone(depth=>9);  # h=1, w=2, d=9

Because none of the slots have a :writer attribute, there is no way to mutate this object. Instead you call a clone method, supplying an overriding value for the constructor argument you need to change. The $volume argument doesn’t get copied over because it’s derived from the constructor arguments.

But not everyone is happy with this approach. Aside from arguments about utility of the clone method, the notion that objects should be immutable by default has frustrated some developers reading the Corinna proposal. Even when I point out just adding a :writer attribute is all you need to do to get your mutability, people still object. So let’s have a brief discussion about immutability and why it’s useful.

But first, here’s my last 2020 Perl Conference presentation on Corinna (at the time, called "Cor").

The Problem

Imagine, for example, that you have a very simple Customer object:

my $customer = Customer->new(
    name      => "Ovid", 
    birthdate => DateTime->new( ... ),
);

In the code above, we’ll assume the $customer can give us useful information about the state of that object. For example, we have a section of code guarded by a check to see if they are old enough to drink alcohol:

if ( $ovid->old_enough_to_drink_alcohol ) {
    ...
}

The above looks innocent enough and it’s the sort of thing we regularly see in code. But then this happens:

if ( $ovid->old_enough_to_drink_alcohol ) {
    my $date = $ovid->birthdate;
    ...
    # deep in the bowels of your code
    my $cutoff_date = $date->set( year => $last_year ); # oops!
    ...
}

We had a guard to ensure that this code would not be executed if the customer wasn’t old enough to drink, but now in the middle of that code, due to how DateTime is designed, someone’s set the customer birth date to last year! The code, at this point, is probably in an invalid state and its behavior can no longer be considered correct.

But clearly no one would do something so silly, would they?

Global State

We’ve known about the dangers of global state for a long time. For example, if I call the following subroutine, will the program halt or not?

sub next ($number) {
    if ( $ENV{BLESS_ME_LARRY_FOR_I_HAVE_SINNED} ) {
        die "This was a bad idea.";
    }
    return $number++;
}

You literally cannot inspect the above code and tell me if it will die when called because you cannot know, by inspection, what the BLESS_ME_LARRY_FOR_I_HAVE_SINNED environment variable is set to. This is one of the reasons why global environment variables are discouraged.

But here we’re talking about mutable state. You don’t want the above code to die, so you do this:

$ENV{BLESS_ME_LARRY_FOR_I_HAVE_SINNED} = 0;
say next(4);

Except that now you’ve altered that mutable state and anything else which relies on that environment variable being set is unpredicatable. So we need to use local to safely change that in the local scope:

{
    local $ENV{BLESS_ME_LARRY_FOR_I_HAVE_SINNED} = 0;
    say next(4);
}

Even that is not good because there’s no indication of why we’re doing this but at least you can see how we can safely change that global variable in our local scope.

ORMs

And I can hear your objection now:

“But Ovid, the DateTime object in your first example isn’t global!”

That’s true. What we had was this:

if ( $ovid->old_enough_to_drink_alcohol ) {
    my $date = $ovid->birthdate;
    ...
    # deep in the bowels of your code
    my $cutoff_date = $date->set( year => $last_year ); # oops!
    ...
}

But the offending line should have been this:

    # note the clone().
    my $cutoff_date = $date->clone->set( year => $last_year );

This is because the set method mutates the object in place, causing everything holding a reference to that object to silently change. It’s not global in the normal sense, but this action at a distance is a source of very real bugs.

It’s a serious enough problem that DateTime::Moonpig and DateTimeX::Immutable have both been written to provide immutable DateTime objects, and that brings me to DBIx::Class, an excellent ORM for Perl.

As of this writing, it’s been around for about 15 years and provides a component called DBIx::Class::InflateColumn::DateTime. This allows you to do things like this:

package Event;
use base 'DBIx::Class::Core';

__PACKAGE__->load_components(qw/InflateColumn::DateTime/);
__PACKAGE__->add_columns(
  starts_when => { data_type => 'datetime' }
  create_date => { data_type => 'date' }
);

Now, whenever you call starts_when or create_date on an Event instance, you’ll get a DateTime object instead of just the raw string from the database. Further, you can set a DateTime object and not worry about your particular database’s date syntax. It just works.

Except that the object is mutable and we don’t want that. You can fix this by writing your own DBIx::Class component to use immutable DateTime objects.

package My::Schema::Component::ImmutableDateTime;

use DateTimeX::Immutable;
use parent 'DBIx::Class::InflateColumn::DateTime';

sub _post_inflate_datetime {
    my ( $self, @args ) = @_;
    my $dt = $self->next::method(@args);
    return DateTimeX::Immutable->from_object( object => $dt );
}

1;

And then load this component:

__PACKAGE__->load_components(
    qw/+My::Schema::Component::ImmutableDateTime/
);

And now, when you fetch your objects from the database, you get nice, immutable DateTimes. And it will be interesting to see where your codebase fails!

Does all of this mean we should never use mutable objects? Of course not. Imagine creating an immutable cache where, if you wanted to add or delete an entry, you had to clone the entire cache to set the new state. That would likely defeat the main purpose of a cache: speeding things up. But in general, immutability is a good thing and is something to strive for. Trying to debug why code far, far away from your code has reset your data is not fun.

A Tiny Note About Interfaces

Ovid — Mon, 12 Apr 2021 07:02:42 +0000

A client wanted to take their current monolithic application and break it up into a series of interconnected services. They wanted to go down the Service Oriented Archicture (SOA) road, but they had made some curious choices regarding what they wanted to expose in their APIs.

That got me to thinking about my daughter, Lilly-Rose. When she was rather young, I had an Android phone and she liked to play with that when we were riding in the car. One day we were driving along when she noticed a new icon on the phone. It had her face. And her name.

“Papa, what’s that?”

“I don’t know. Why don’t you click it?”

So she did. And this is what she saw.

She smiled, but looked confused and showed me the phone (I was in the back with her). “It just says ‘click me’.”

“So, why not?”

So she clicked it and saw something like this:

And then she clicked it again and saw something like this:

She kept getting weird things like “the cats are waiting for us to make a mistake before they take over” or “what do lawyers wear to court? Lawsuits.” (I had to explain that one) and she was laughing and having a blast. Every time she pressed the button she'd get a random joke on a random background color. Chalk that up as a win for papa.

So what does this have to do with APIs? You can think of that application as an API, albeit for my daughter and not another piece of software. I created it so that even my young daughter could figure out how to use it. I even got a bug report from her! (“Papa, it doesn’t work!”) It took me a while to realize that sometimes she would get the same joke twice and the same background color. So I made sure she’d never get the same joke twice in row.

Now you, as a software developer, might wonder ”what programming language did Ovid use?” or “are the jokes stored in a database?” My daughter didn’t care and neither should you. The technology used was the scaffolding to build the application. The application’s interface should hide the scaffolding.

Exposing the scaffolding is making an implicit promise that it can be relied on. People will be unhappy if you break that promise.

To be honest, this is so trivial that it seems ridiculous to write a post about it, but I've seen this violated so many times that I have to write a post about it. What language did I use? Who cares? So long as the app is the same, my daughter would still laugh. Were the jokes hard-coded, stored in a database, fetched from a remote service? Again, it doesn't matter. And it shouldn't be exposed in the interface.

It’s not that those technical decisions are unimportant. It’s that the end consumer shouldn’t have to be aware of them. When you hide them, you can change them, or fix them. When you expose them, whether it be in an application or an API, consumers of what you’re providing learn to rely on those details and then you get stuck providing them. So don’t do that.

For those who must know the technical details, here’s the core of the app, written in Kotlin:

package io.github.ovid.lilly_roserules

import android.graphics.Color
import androidx.appcompat.app.AppCompatActivity
import android.os.Bundle
import android.widget.Button
import android.widget.TextView
import kotlin.random.Random
import androidx.constraintlayout.widget.ConstraintLayout

class MainActivity : AppCompatActivity() {
    internal lateinit var layout: ConstraintLayout

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        layout = findViewById(R.id.mainlayout)

        val rollButton    = findViewById<Button>(R.id.rollButton)
        val quoteTextView = findViewById<TextView>(R.id.quoteTextView)

        val messages      = getMessages()
        val colors        = getColors()
        var lastChoice    = 0

        rollButton.setOnClickListener {
            var thisChoice = 0

            // make sure we never get the same message twice in a row
            while (thisChoice == lastChoice) {
                thisChoice = Random.nextInt(messages.size)
            }
            lastChoice = thisChoice

            // picks a random message
            quoteTextView.text = messages[thisChoice]

            // picks a random background color
            val randColor = Random.nextInt(colors.size)
            layout.setBackgroundColor(Color.parseColor(colors[randColor]))
        }
    }

    fun getColors():Array<String> {
        val colors = arrayOf(
            // list of colors
        )
        return colors
    }

    fun getMessages():Array<String> {
        val messages = arrayOf(
            // list of jokes
        )
        return messages
    }
}

It was dead-simple and just thrown together quickly to amuse my daughter. But if I ever needed to make it “production ready,” all the important bits were hidden away. Because this was an application on a smart phone and not an API, this sounds silly, but I’m hard-pressed to find a better analogy to explain this mistake I see made over and over again.

You Need Measurable Goals

Ovid — Thu, 08 Apr 2021 05:30:06 +0000

One thing I try to reinforce with clients, particularly those who have "big" projects, is that they need measurable success criteria. Sometimes that's dangerous because numbers can be gamed—such as the support manager who won an award for reducing support calls, only to find he had hidden the support phone number on our web site—but if you have measurable goals, you can use that to declare success or failure, rather than punting on the problem.

Case in point is my favorite "successful" project: I was with a company that decided to rewrite a massive Perl system in C++ because Perl was too slow.

Already, I know several of you want to know what "too slow" means in this context, but no, you gotta wait for the punchline.

This project was written years ago and was thus "legacy" code. It was clunky, but critical. Honestly, fixing it would have been cheaper and faster than a rewrite, but you probably know that. The dangers of rewrites are well-known.

But it didn't matter. Perl was too slow, so C++ it was.

And unlike many rewrites, this one was finished. It took them years, but they finished. The new code was a bit of a pig, but they finally lifted the millstone of Perl's performance from around their neck.

But there was a small problem. Turns out the C++ system, years in development, a steaming pile of ones and zeros, and possibly just a wee bit unmaintainable ...

... didn't run any faster than the Perl system it replaced.

You know what makes software slow? Network congestion. I/O issues. SOAP instead of JSON. RPC. Oh, and the database. It's always the database. Unless a system is CPU-bound, changing the programming language will often gain you no performance improvements!

The new project was started years ago, so when I asked, no one was sure if the system that was "too slow" had any performance profiling done to find the bottlenecks.

But management crowned it a success anyway, because now that it was done, no one wanted to admit that its only measurable success criteria showed it to be a complete failure.

When I left the company there was talk—I'm not kidding—about rewriting the system in Perl.

Sane Database Change Management

Ovid — Mon, 05 Apr 2021 09:52:08 +0000

This happens all the time when dealing with new clients:

Me: How do you update the database?
Client: Check the 'sql' directory to see if there's new SQL in there. Run that.
Me: Manually? Are you serious?
Client: Yes. We're used to it.
Me: Um, OK. So how do you back out the change if there's a problem?
Client: There's hardly ever a problem, but we fix it by hand.

And things go downhill from there. I frequently meet clients with insane database migration strategies. The "dump SQL in a directory" for people to apply is an annoyingly common strategy. There's no clear way to roll it back and, you can't declare dependencies. If you're using a database like MySQL or Oracle, if it contains DDL changes, those aren't transaction safe, so they really should be in their own migration, but they're not. I even had one client where they emailed developers to let them know which SQL to apply.

A few clients have an in-house database migration strategy involving numbered migrations. It often looks like this:

...
213-up-add-index-on-address-state.sql
213-down-add-index-on-address-state.sql
214-up-add-customer-notes-table.sql
214-down-add-customer-notes-table.sql
215-up-add-sales-tax-sproc.sql
215-down-add-sales-tax-sproc.sql

That, at least, can allow devs to back out changes (but tricky if your DDL isn't transaction-safe), but it's amazing when you get to migration 215 and you have eight developers, four of whom need to make a database change and they're arguing over who gets number 216. Yes, I've seen this happen more than once.

With a naïve numbering strategy, you can't declare dependencies, you get numbering conflicts, you really can't "tag" a particular migration for deployment, and so on.

Or there are the migration strategies which allow migrations to be written in your favorite programming language. Those are often nice, but can't always leverage the strength of the database, often write very poor SQL, and make it hard for other teams not using the language to write migrations.

There's a better way.

Sqitch

The sqitch (pronounced "skitch", not "skwitch") Web site describes sqitch as:

Sensible database-native change management for framework-free development and dependable deployment.

That is, uh, quite the mouthful. But it's accurate. It has great documentation with tutorials for Postgres, SQLite, MySQL, Firebird, Exasol, Oracle, Snowflake, and Vertica. Out of the box, sqitch offers sane, easy-to-use database change management. Since it's both free and open source, it's also easy to hook into and customize, if needed.

sqitchers / sqitch

Sensible database change management

App/Sqitch version v1.4.2-dev

Release	Coverage	Database

Sqitch is a database change management application. It currently supports:

PostgreSQL 8.4+
YugabyteDB 2.6+
CockroachDB 21+
SQLite 3.8.6+
MySQL 5.1+
MariaDB 10.0+
Oracle 10g+,
Firebird 2.0+
Vertica 7.2+
Exasol 6.0+
Snowflake

What makes it different from your typical migration approaches? A few things:

No opinions

Sqitch is not tied to any framework, ORM, or platform. Rather, it is a standalone change management system with no opinions about your database engine, application framework, or development environment.
Native scripting

Changes are implemented as scripts native to your selected database engine Writing a PostgreSQL application? Write SQL scripts for psql. Writing an Oracle-backed app? Write SQL scripts for SQL*Plus.
Dependency resolution

Database changes may declare dependencies on other changes -- even on changes from other Sqitch projects. This ensures proper order of execution, even when you've committed changes to your VCS out-of-order.
Deployment integrity

…

View on GitHub

Sqitch 101

I won’t cover setting up sqitch. It's pretty easy and the tutorials handle it well. Instead, I'll explain how I get teams up-and-running quickly with it.

However, there is one small change I recommend. By default, in the current directory, sqitch will add deploy, revert, and verify directories. Your SQL will go into those directories. I prefer to minimize the number of top-level directories in my projects, so after I have things set up, I usually run a command similar to this (the following assumes PostgreSQL):

sqitch engine alter pg --top-dir sql

That tells the sqitch program that when you're using the PostgreSQL engine, create sql/deploy/, sql/revert, and sql/verify directories. Thus, you only have one top-level sql directory for managing your sqitch files.

And for the sake of what follows, we'll assume that we have a acme_test target that we run our tests against and that we've set that target to be the default (that will make more sense when you've read the docs).

Also, please note that my usage pattern is not quite the same as what's taught in the tutorials. Instead, it's designed to be an easy workflow for any developer to understand.

After you have sqitch set up and have your initial schema added to sqitch, you can add a change. The basic pattern is "create a branch", "add sql changes", "modify code as needed", "commit" and merge back.

To be more specific, let's say that you want to add a title column to the customers table (note that the name passed to the sqitch add command is arbitrary, but I recommend you pick a naming convention and stick to it).

Create a branch in your source control
Run sqitch add customers/title
Edit sql/deploy/customers/title and sql/revert/customers/title to add your sql
Run sqitch deploy to deploy those changes
Edit your code if needed
Run your tests
Commit your changes
Merge back to your main branch

(Note: for the first and last steps, if you're using git, see my easy git workflow)

When doing this, it's also a good idea to revert your the sqitch change(s) you've added and redeploy. This makes it easier to spot the case where your revert file doesn't properly revert the deploy:

sqitch rebase --onto @HEAD^ -y

Alternatively, if you're not comfortable with the rebase command:

alias bounce='sqitch revert --to @HEAD^ -y && sqitch deploy'

Every change goes through the same pattern. It's almost like working with git, with a steady queue of changes adding up. This is a simpler pattern than what is explained in the sqitch docs, but you don't have to explain reworking or rebasing changes. There are some presentations you might want to watch if you'd like to learn more.

Sqitch was also highlighted on FLOSS Weekly:

So What?

First and foremost, you get to write your database changes in SQL, not in some "DSL" that you're provided with. You can leverage the full power of your database. Often, I find SQL generators produce poor SQL, or simply won't produce the SQL that I need. When was the last time your DSL let you create optimizer hints?

However, if you do need more than just SQL, it's easy enough to write sqitch middleware to intercept the call to the database with your own wrapper:

sqitch engine add pg --client /path/to/my/middleware

I've done this for a client who wanted to write database changes using Percona's excellent (and free) pt-online-schema-change tool.

Also, what happens if you get a conflict with git and the sqitch.plan file? It's easy to have a bad rebase and fix it incorrectly. Internally, sqitch uses checksums to determine the changes you've applied and their order, so a bad rebase won't allow you to accidentally apply the wrong changes.

Conclusion

I've barely scratched the surface of what you can do with sqitch. It's amazing how many other database change management systems get this wrong. When I switch teams over to sqitch, most of their database development pain just goes away. If you're having trouble with database migrations, try sqitch. You won't regret it.

Cover Photo by Tobias Fischer

The Zen of Test Suites

Ovid — Tue, 30 Mar 2021 07:23:47 +0000

The Zen of Application Test Suites

This is a long read, but it's an important one about one of the most common problems I see with my clients: they all have broken test suites. Learning testing is as much a skill as learning coding and this long article only scratches the surface.

Much of what I describe below is generic and applies to test suites written in any programming language, despite many examples being written in Perl.

Introduction

I often speak with developers who take a new job and they describe a Web site built out of a bunch of separate scripts scattered randomly through directories, lots of duplicated code, poor use of modules, with embedded SQL and printing HTTP headers and HTML directly. The developers shake their head in despair, but grudgingly admit an upside: job security. New features are time-consuming to add, changes are difficult to implement and may have wide-ranging side-effects, and reorganizing the codebase to have a proper separation of concerns, to make it cheaper and safer to hack on, will take lots and lots of time.

A bunch of randomly scattered scripts, no separation of concerns, lots of duplicated code, poor use of modules, SQL embedded directly in them? Does this sound familiar? It's your standard test suite. We're horrified by this in the code, but don't bat an eyelash at the test suite.

Part of this is because much, if not most, of the testing examples we find focus on testing distributions, not applications. If you were to look at the tests for my module DBIx::Class::EasyFixture, you'd see the following tests:

00-load.t
basic.t
definitions.t
groups.t
load_multiple_fixtures.t
many_to_many.t
no_transactions.t
one_to_many.t
one_to_one.t

These tests were added one by one, as I added new features to DBIx::Class::EasyFixture and each *.t file represents (more or less) a different feature.

For a small distribution, this isn't too bad because it's very easy to keep it all in your head. With only nine files, it's trivial to glance at them, or grep them, to figure out where the relevant tests are. Applications, however, are a different story. This is the number of test classes from the Tau Station MMORPG test suite:

$ find t/tests -name '*.pm' | wc -l
589

One codebase I worked on had close to a million lines of code with thousands of test scripts. You couldn't hold the codebase in your head, you're couldn't glance at the tests to figure out what went where, nor was grepping necessarily going to tell you as tests for particular sections of code were often scattered around multiple test scripts. And, of course, I regularly hear the lament I've heard at many shops with larger codebases: where are the tests for feature X? Instead of just sitting down and writing code, the developers are hunting for the tests, wondering if there are any tests for the feature they're working on and, if not, trying to figure out where to put their new tests.

Unfortunately, this disorganization is only the start of the problem.

Large-scale test suites

I've worked with many companies with large test suites and they tend to share some common problems. I list them below in the order I try to address these problems (in other words, roughly easiest to hardest).

Tests often emit warnings
Tests often fail ("oh, that sometimes fails. Ignore it.")
There is little evidence of organization
Much of the testing code is duplicated
Testing fixtures are frequently not used (or poorly used)
Code coverage is spotty
They take far too long to run

Problems are one thing, but what features do we want to see in large-scale test suites?

Tests should be very easy to write and run
They should run relatively quickly
The order in which tests run should not matter
Test output should be clean
It should be obvious where to find tests for a particular piece of code
Testing code should not be duplicated
Code coverage should be able to analyze different aspects of the system

Let's take a look at some of the problems and try to understand their impacts. While it's good to push a test suite into a desirable state, often this is risky if the underlying problems are ignored. I will offer recommendations for resolving each problem, but it's important to understand that these are recommendations. They may not apply to your situation.

Tests often emit warnings

This seems rather innocuous. Sure, code emits warnings and we're used to that. Unfortunately, we sometimes forget that warnings are warnings: there might very well be something wrong.

In my time at the BBC, one of the first things I did was try to clean up all of the warnings. One was a normal warning about use of an undefined variable, but it was unclear to me from the code if this should be an acceptable condition. Another developer looked at it with me and realized that the variable should never be undefined: this warning was masking a very serious bug in the code, but the particular condition was not explicitly tested. By rigorously eliminating all warnings, we found it easier to make our code more correct, and in those places where things were dodgy, comments were inserted into the code to explain why warnings were suppressed. In short: the code became easier to maintain.

Another issue with warnings in the test suite is that they condition developers to ignore warnings. We get so used to them that we stop reading them, even if something serious is going on (on a related note, I often listen to developers complain about stack traces, but a careful reading of a stack trace will often reveal the exact cause of the exception). New warnings crop up, warnings change, but developers conditioned to ignore them often overlook serious issues with their code.

Recommendation: Eliminate all warnings from your test suite, but investigate each one to understand if it reflects a serious issue. Also, some tests will capture STDERR, effectively hiding warnings. Making warnings fatal while running tests can help to overcome this problem.

Tests often fail ("oh, that sometimes fails. Ignore it.")

For one client, their hour-long test suite had many failing tests. When I first started working on it, I had a developer walk me through all of the failures and explain why they failed and why they were hard to fix. Obviously this is a far more serious problem than warnings, but in the minds of the developers, they were under constant deadline pressures and as far as management was concerned, the test suite was a luxury to keep developers happy, not "serious code." As a result, developers learned to recognize these failures and consoled themselves with the thought that they understood the underlying issues.

Of course, that's not really how it works. The developer explaining the test failures admitted that he didn't understand some of them and with longer test suites that routinely fail, more failures tend to crop up. Developers conditioned to accept failures tend not to notice them. They kick off the test suite, run and grab some coffee and later glance over the results to see if they look reasonable (that's assuming they run all of the tests, something which often stops happening at this point). What's worse, continuous integration tools are often built to accomodate this. From the Jenkin's xUnit Plugin page:

Features

Records xUnit tests

Mark the build unstable or fail according to threshold values

In other words, there's an "acceptable" level of failure. What's the acceptable level of failure when you debit someone's credit card, or you're sending their medical records to someone, or you're writing embedded software that can't be easily updated?

Dogmatism aside, you can make a case for acceptable levels of test failure, but you need to understand the risks and be prepared to accept them. However, for the purposes of this document, we'll assume that the acceptable level of failure is zero.

If you absolutely cannot fix a particular failure, you should at least mark the test as TODO so that the test suite can pass. Not only does this help to guide you to a clean test suite, the TODO reason is generally embedded in the test, giving the next developer a clue to what's going on.

Recommendation: Do not allow any failing tests. If tests fail which do not impact the correctness of the application (such as documentation or "coding style" tests), they should be separated from your regular tests in some manner and your systems should recognize that it's OK for them to fail.

There is little evidence of organization

As mentioned previously, a common lament amongst developers is the difficulty of finding tests for the code they're working on. Consider the case of HTML::TokeParser::Simple. The library is organized like this:

lib/
└── HTML
    └── TokeParser
        ├── Simple
        │   ├── Token
        │   │   ├── Comment.pm
        │   │   ├── Declaration.pm
        │   │   ├── ProcessInstruction.pm
        │   │   ├── Tag
        │   │   │   ├── End.pm
        │   │   │   └── Start.pm
        │   │   ├── Tag.pm
        │   │   └── Text.pm
        │   └── Token.pm
        └── Simple.pm

There's a class in there named HTML::TokeParser::Simple::Token::ProcessInstruction. Where, in the following tests, would you find the tests for process instructions?

t
├── constructor.t
├── get_tag.t
├── get_token.t
├── internals.t
└── munge_html.t

You might think it's in the get_token.t test, but are you sure? And what's that strange munge_html.t test? Or the internals.t test? As mentioned, for a small library, this really isn't too bad. However, what if we reorganized our tests to reflect our library hierarchy?

t/
└── tests/
    └── html/
        └── tokeparser/
            ├── simple/
            │   ├── token/
            │   │   ├── comment.t
            │   │   ├── declaration.t
            │   │   ├── tag/
            │   │   │   ├── end.t
            │   │   │   └── start.t
            │   │   ├── tag.t
            │   │   └── text.t
            │   └── token.t
            └── simple.t

It's clear that the tests for HTML::TokeParser::Simple::Token::Tag::Start are in t/tests/html/tokeparser/simple/token/tag/start.t. And you can see easily that there is no file for processinstruction.t. This test organization not only makes it easy to find where your tests are, it's also easy to program your editor to automatically switch between the code and the tests for the code. For large test suites, this saves a huge amount of time. When I reorganized the test suite of the BBC's central metadata repository, PIPs, I followed a similar pattern and it made our life much easier.

(Note: the comment about programming your editor is important. Effective use of your editor/IDE is one of the most powerful tools in a developer's toolbox.)

Of course, your test suite could easily be more complicated and your top-level directories inside of your test directory may be structured differently:

t
├── unit/
├── integration/
├── api/
└── web/

Recommendation: Organize your test files to have a predictable, discoverable structure. The test suite should be much easier to work with.

Much of the testing code is duplicated

We're aghast that people routinely cut-n-paste their application code, but we don't even notice when people do this in their test code. More than once I've worked on a test suite with a significant logic change and I've had to find this duplicated code and either change it many places or try to refactor it so that it's in a single place and then change it. We already know why duplicated code is bad, I'm unsure why we tolerate this in test suites.

Much of my work in tests has been to reduce this duplication. For example, many test scripts list the same set of modules at the top. I did a heuristic analysis of tests on the CPAN and chose the most popular testing modules and that allowed me to change this:

use strict;
use warnings;
use Test::Exception;
use Test::Differences;
use Test::Deep;
use Test::Warn;
use Test::More tests => 42;

To this:

use Test::Most tests => 42;

You can easily use similar strategies to bundle up common testing modules into a single testing module that all of your tests use. Less boilerplate and you can easily dive into testing.

Or as a more egregious example, I often see something like this (a silly example just for illustration purposes):

set_up_some_data($id);
my $object = Object->new($id);

is $object->attr1, $expected1, 'attr1 works';
is $object->attr2, $expected2, 'attr2 works';
is $object->attr3, $expected3, 'attr3 works';
is $object->attr4, $expected4, 'attr4 works';
is $object->attr5, $expected5, 'attr5 works';

And then a few lines later:

set_up_some_data($new_id);
my $object = Object->new($new_id);

is $object->attr1, $expected1, 'attr1 works';
is $object->attr2, $expected2, 'attr2 works';
is $object->attr3, $expected3, 'attr3 works';
is $object->attr4, $expected4, 'attr4 works';
is $object->attr5, $expected5, 'attr5 works';

And then a few lines later, the same thing ...

And in another test file, the same thing ...

Put that in its own test function and wrap those attribute tests in a loop. If this pattern is repeated in different test files, put it in a custom test library:

sub test_fetching_by_id {
    my ( $class, $id, $tests ) = @_;
    my $object = $class->new($id);

    foreach my $test (@$tests) {
        my ( $attribute, $expected ) = @$test;
        is $object->$attribute, $expected, 
          "$attribute works for $class $id";
    }
}

And then you call it like this:

my @id_tests = (
    { id => $id,
      tests => [
        [ attr1 => $expected1 ],
        [ attr2 => $expected2 ],
        [ attr3 => $expected3 ],
        [ attr4 => $expected4 ],
        [ attr5 => $expected5 ],
    ]},
    { id => $new_id,
      tests  => [
        [ attr1 => $new_expected1 ],
        [ attr2 => $new_expected2 ],
        [ attr3 => $new_expected3 ],
        [ attr4 => $new_expected4 ],
        [ attr5 => $new_expected5 ],
    ]},
);

for my $test ( @id_tests ){
    test_fetching_by_id( 
       'Object',
        $test->{id},
        $tests->{test},
    );
}

This is a cleanly refactored data-driven approach. By not repeating yourself, if you need to test new attributes, you can just add an extra line to the data structures and the code remains the same. Or, if you need to change the logic, you only have one spot in your code where this is done. Once a developer understands the test_fetching_by_id() function, they can reuse this understanding in multiple places. Further, it makes it easier to find patterns in your code and any competent programmer is always on the lookout for patterns because those are signposts leading to cleaner designs.

Recommendation: Keep your test code as clean as your application code.

Testing fixtures are frequently not used (or poorly used)

One difference between your application code and the test suite is in an application, we often have no idea what the data will be and we try to have a clean separation of data and code.

In your test suite, we also want a clean separation of data and code (in my experience, this is very hit-or-miss), but we often need to know the data we have. We set up data to run tests against to ensure that we can test various conditions. Can we give a customer a birthday discount if they were born on February 29th? Can a customer with an overdue library book check out another? If our employee number is no longer in the database, is our code properly deleted, along with the backups and the git history erased? (kidding!)

When we set up the data for these known conditions under which to test, we call the data a test fixture. Test fixtures, when properly designed, allow us generate clean, understandable tests and make it easy to write tests for unusual conditions that may otherwise be hard to analyze.

There are several common anti-patterns I see in fixtures.

Hard to set up and use
Adding them to the database and not rolling them back
Loading all your test data at once with no granularity

In reviewing various fixture modules on the CPAN and for clients I have worked with, much of the above routinely holds true. On top of that, documentation is often rather sparse or non-existent. Here's a (pseudo-code) example of an almost undocumented fixture system for one client I worked with and it exemplified common issues in this area.

load_fixture(
    database => 'sales',
    client   => $client_id,
    datasets => [qw/customers orders items order_items/],
);

This had several problems, all of which could be easily corrected as code, but they built a test suite around these problems and had backed themselves into a corner, making their test suite dependent on bad behavior.

The business case is that my client had a product serving multiple customers and each customer would have multiple separate databases. In the above, client $client_id connects to their sales database and we load several test datasets and run tests against them. However, loading of data was not done in a transaction, meaning that there was no isolation between different test cases in the same process. More than once I caught issues where running an individual test case would often fail because it depended on data loaded by a different test case, but it wasn't always clear which test cases were coupled with which.

Another issue is that fixtures were not fine-tuned to address particular test cases. Instead, if you loaded "customers" or "referrals", you got all of them in the database. Do you need a database with a single customer with a single order and only one order item on it to test that obscure bug that occurs when a client first uses your software? There really wasn't any clean way of doing that; data was loaded in an "all or nothing" context. Even if you violated the paradigm and tried to create fine-tuned fixtures, it was very hard to write them due to the obscure, undocumented format needed to craft the data files for them.

Because transactions were not used and changes could not be rolled back, each *.t file would rebuild its own test database, a very slow process. Further, due to lack of documentation about the fixtures, it was often difficult to figure out which combination of fixtures to load to test a given feature. Part of this is simply due to the complex nature of the business rules, but the core issues stemmed from a poor understanding of fixtures. This client now has multiple large, slow test suites, spread across multiple repositories, all of which constantly tear down and set up databases and load large amounts of data. The test suites are both slow and fragile. The time and expense to fix this problem is considerable due to how long they've pushed forward with this substandard setup.

What you generally want is the ability to easily create understandable fixtures which are loaded in a transaction, tests are run, and then changes are rolled back. The fixtures need to be fine-grained so you can tune them for a particular test case.

One attempt I've made to fix this situation is releasing DBIx::Class::EasyFixture, along with a tutorial. It does rely on DBIx::Class, the most popular ORM for Perl. This will likely make it unsuitable for some use cases.

Using fixtures is now very simple:

my $fixtures = DBIx::Class::EasyFixture->new(
  schema => $schema
);
$fixtures->load('customer_with_order_without_items');

# run your tests

For the customer's code, we could satisfy the different database requirements by passing in different schemas. Other (well-documented) solutions, particularly those which are pure DBI based are welcome in this area.

Recommendation: Use fine-grained, well-documented fixtures which are easy to create and easy to clean up.

Code coverage is poorly understood

Consider the following code:

float recip(float number) {
    return 1.0 / number;
}

And a sample test:

assert recip(2.0) returns .5;

Congratulations! You now have 100% code coverage of that function.

For a statically typed language, I'm probably going to be moderately comfortable with that test. Alas, for dynamically typed languages we're fooling ourselves. An equivalent function in Perl will pass that test if we use recip("2 apples") as the argument. And what happens if we pass a file handle? And would a Unicode number work? What happens if we pass no arguments? Perl is powerful and lets us write code quickly, but there's a price: it expects us to know what we're doing and passing unexpected kinds of data is a very common source of errors, but one that 100% code coverage will never (no pun intended) uncover. This can lead to false confidence.

To work around false confidence in your code, always assume that you write applications to create things and you write tests to destroy them. Testing is, and should be, an act of violence. If you're not breaking anything with your tests, you're probably doing it wrong.

Or what if you have that code in a huge test suite, but it's dead code? We tend to blindly run code coverage over our entire test suite, never considering whether or not we're testing dead code. This is because we slop our unit, integration, API and other tests all together.

Or consider the following test case:

sub test_forum {
    my $test = shift;
    my $site = $test->test_website;
    $site->login($user, $pass);
    $site->get('/forum');
    $site->follow_link( text => 'Off Topic' );
    $site->post_ok({
        title => 'What is this?',
        body  => 'This is a test'.
    }, 'We should be able to post to the forum');
}

Devel::Cover doesn't know which code is test code and which is not. Devel::Cover merely tells you if your application code was exercised in your tests. You can annotate your code with "uncoverable" directives to tell Devel::Cover to ignore the following code, but that potentially means sprinkling your code with annotations all over the place.

There are multiple strategies to deal with this. One of the simplest is to merely run your code coverage tools over the public-facing portions of your code, such as web or API tests. If you find uncovered code, you either have code that is not fully tested (in the sense that you don't know if your API can really use that code) or, if you cannot write an API test to reach that code, investigate if it is dead code.

You can do this by grouping your tests into subdirectories:

t/
|--api/
|--integration/
`--unit/

Alternatively, if you use Test::Class::Moose, you can tag your tests and only run coverage over tests including the tags you wish to test:

My::Test::Class::Moose->new({
  include_tags => [qw/api/],
})->runtests;

If you start tagging your tests by the subsystems they are testing, you can then start running code coverage on specific subsystems to determine which ones are poorly tested.

Recommendation: Run coverage over public-facing code and on different subsystems to find poor coverage.

They take far too long to run

The problem with long-running test suites is well known, but it's worth covering this again here. These are problems that others have discussed and that I have also personally experienced many times.

With apologies to XKCD

In the best case scenario for developers who always run that long-running test suite, expensive developer time is wasted while the test suite is running. When they launch that hour-long (or more) test suite, they frequently take a break, talk to (read: interrupt) other developers, check their Facebook, or do any number of things which equate to "not writing software." Yes, some of those things involve meetings or research, but meetings don't conveniently schedule themselves when we run tests and for mature products (those which are more likely to have long-running test suites), there's often not that much research we really need to do.

Here are some of the issues with long-running test suites:

Expensive developer time is wasted while the test suite runs
Developers often don't run the entire test suite
Expensive code coverage is not generated as a result
Code is fragile as a result

What I find particularly curious is that we accept this state of affairs. Even a back-of-the-envelope calculation can quickly show significant productivity benefits that will pay off in the long run by taking care of our test suite. I once reduced a BBC test suite's run time from one hour and twenty minutes down to twelve minutes (Note: today I use a saner approach that results in similar or greater performance benefits). We had six developers on that team. When the test suite took over an hour to run, they often didn't run the test suite. They would run tests on their section of code and push their code when they were comfortable with it. This led to other developers finding buggy code and wasting time trying to figure out how they had broken it when, in fact, someone else broke the code.

But let's assume each developer was running the test suite at least once a day (I'm careful about testing and often ran mine twice a day). By cutting test suite run time by over an hour, we reclaimed a full day of developer productivity every day! Even if it takes a developer a month to increase perfomance by that amount it pays for itself many times over very quickly. Why would you not do this? As a business owner, wouldn't you want your developers to save time on their test suite so they can create features faster for you?

There are several reasons why this is difficult. Tasking a developer with a block of time to speed up a test suite means the developer is not creating user-visible features during that time. For larger test suites, it's often impossible to know in advance just how much time you can save or how long it will take you to reach your goal. In most companies I've worked with, the people who can make the decision to speed up the test suite are often not the people feeling the pain. Productivity and quality decrease slowly over time, leading to the boiling frog problem.

What's worse: in order to speed up your test suite without affecting behavior, the test suite often has to be "fixed" (eliminating warnings, failures, and reducing duplication) to ensure that no behavior has been changed during the refactor.

Finally, some developers simply don't have the background necessary to implement performance optimizations. While performance profiles such as Perl's Devel::NYTProf can easily point out problem areas in the code, it's not always clear how to overcome the discovered limitations.

The single biggest factor in poor test suite performance for applications is frequently I/O. In particular, working with the database tends to be a bottleneck and there's only so much database tuning that can be done. After you've profiled your SQL and optimized it, several database-related optimizations which can be considered are:

Using transactions to clean up your database rather than rebuilding the database
Only connect to the database once per test suite (hard when you're using a separate process per test file)
If you must rebuild the database, maintain a pool of test databases and assign them as needed, rebuilding used ones in the background
Use smaller database fixtures instead of loading everything at once

After you've done all you can to improve your database access, you may find that your test suite is "fast enough", but if you wish to go further, there are several steps you can take.

Use Test::Aggregate

Test::Aggregate can often double the speed of your test suite (I've had it speed up test suites by around 65%). It does this by taking your separate *.t files and runs them in a single process. Not all tests can be run this way (tests that munge global state without cleaning up are prime examples), but it's the easiest way to get a quick boost to test suite performance.

Aggressively search for and remove duplicated tests.

For poorly organized test suites, developers sometimes make the mistake of putting tests for something in a new *.t file or add them to a different *.t file, even if related tests already exist. This strategy can be time-consuming and often does not result in quick wins.

Run Performance Profiling

For one test suite, I found that we were using a pure Perl implementation of JSON. As the test suite used JSON extensively, switching to JSON::XS gave us a nice performance boost. We may not have noticed that if we hadn't been profiling our code with Devel::NYTProf.

Look for code with "global" effects

On one test suite, I ensured that Universal::isa and Universal::can cannot be loaded. It was a quick fix and sped up the test suite by 2% (several small accumulations of improvements can add up quickly).

Inline "hot" functions.

Consider the following code which runs in about 3.2 seconds on my computer:

#!/usr/bin/env perl
use strict;
use warnings;
no warnings 'recursion';

for my $i ( 1 .. 40 ) {
    for my $j ( 1 .. $i**2 ) {
        my $y = factorial($j);
    }
}

sub factorial {
    my $num = shift;
    return 1 if $num <= 1;
    return $num * factorial($num - 1);
}

By rewriting the recursive function as a loop, the code takes about .87 seconds:

sub factorial {
    my $num = shift;
    return 1 if $num <= 1;
    $num *= $_ for 2 .. $num - 1;
    return $num;
}

By inlining the calculation, the code completes in .69 seconds:

for my $i ( 1 .. 40 ) {
    for my $j ( 1 .. $i**2 ) {
        my $y = $j;
        if ( $y > 1 ) {
            $y *= $_ for 2 .. $y - 1;
        }
    }
}

In other words, in our trivial example, the inlined behavior is roughly 20% faster than the iterative function and 80% faster than the recursive function.

Recompile your Perl

You may wish to recompile your Perl to gain a performance improvement. Many Linux distributions ship with a threaded Perl by default. Depending on the version of Perl you ship with, you can gain performance improvements of up to 30% by recompiling without threads. Of course, if you use threads, you'll feel very stupid for doing this. However, if you don't make heavy use of threads, switching to a forking model for the threaded code may make the recompile worth it. Naturally, you'll need to heavily benchmark your code (preferably under production-like loads) to understand the trade-offs here.

Preload modules

If your codebase makes heavy use of modules that are slow to load, such as Moose, Catalyst, DBIx::Class and others, preloading them might help. forkprove is a utility written by Tatsuhiko Miyagawa that allows you to preload slow-loading modules and then forks off multiple processes to run your tests. Using this tool, I reduced one sample test suite's run time from 12 minutes to about a minute. Unfortunately, forkprove doesn't allow schedules, a key component often needed for larger test suites. I'll explain that in the next section.

Parallel tests

Running tests in parallel is tricky. Some tests simply can't be run with other tests. Usually these are tests which alter global state in some manner that other processes will pick up, or might cause resource starvation of some kind.

Or some tests can be run in parallel with other tests, but if several tests are updating the same records in the database at the same time, locking behavior might slow down the tests considerably.

Or maybe you're running 4 jobs, but all of your slowest tests are grouped in the same job: not good.

To deal with this, you can create a schedule that assigns different tests to different jobs, based on a set of criteria, and then puts tests which cannot run in parallel in a single job that runs after the others have completed.

You can use TAP::Parser::Scheduler to create an effective parallel testing setup. You can use this with TAP::Parser::Multiplexer to create your parallel tests. Unfortunately, as of this writing there's a bug in the Multiplexer whereby it uses select in a loop to read the parser output. If one parser blocks, none of the other output is read. Further, the schedule must be created prior to loading your test code, meaning that if your tests would prefer a different schedule, you're out of luck. Also, make test currently doesn't handle this well. There is work being done by David Golden to alleviate this problem.

My preferred solution is to use Test::Class::Moose. It has built-in parallel testing and writing schedules is very easy. Further, different test cases can simply use a Tags(noparallel) attribute to ensure that they're run sequentially after the parallel tests.

Aside from the regular benefits of Test::Class::Moose, an interesting benefit of this module is that it loads all of your test and application code into a single process and then forks off subprocesses. As a result, your code is loaded once and only once. Alternate strategies which try to fork before loading your code might still cause the code to be loaded multiple times.

I have used this strategy to reduce a 12 minute test suite to 30 seconds.

Distributed tests

Though I haven't used this module, Alex Vandiver has written TAP::Harness::Remote. This module allows you to rsync directory trees to multiple servers and run tests on those servers. Obviously, this requires multiple servers.

If you want to roll your own version of this, I've also released TAP::Stream, a module that allows you to take streams (the text, actually) of TAP from multiple sources and combine them into a single TAP document.

Devel::CoverX::Covered

There is yet another interesting strategy: only run tests that exercise the code that you're changing. Johan Lindström wrote Devel::CoverX::Covered. This module is used in conjunction with Paul Johnson's Devel::Cover to identify all the places in your tests which cover a particular piece of code. In the past, I've written tools for vim to read this data and only run relevant tests. This is a generally useful approach, but there are a couple of pitfalls.

First, if your test suite takes a long time to run, it will take much, much longer to run with Devel::Cover. As a result, I recommend that this be used with a special nightly "cover build" and have the results synched back to the developers.

Second, when changing code, it's easy to change which tests cover your code, leading to times when this technique won't cover your actual changes thoroughly. In practice, this hasn't been a problem for me, but I've not used it enough to say that with confidence.

Recommendation: Don't settle for slow test suites. Pick a goal and work to achieving that goal (it's easy to keep optimizing for too long and start getting diminishing marginal returns).

Test::Class::Moose

If you start creating a large Web site, do you start writing a bunch of individual scripts, each designed to handle one URL and each handling their own database access and printing their output directly to STDOUT? Of course not. Today, professional developers reach for Sinatra, Seaside, Catalyst, Ruby on Rails or other Web frameworks. They take a bit more time to set up and configure, but we know they generally save more time in the long run. Why wouldn't you do that with your test suite?

If you're using Perl, many of the problems listed in this document can be avoided by switching to Test::Class::Moose. This is a testing framework I designed to make it very easy to test applications. Once you understand it, it's actually easy to use for testing libraries, but it really shines for application testing.

Note that I now regret putting Moose in the name. Test::Class::Moose is a rewrite of Test::Class using Moose, but it's not limited to testing Moose applications. It uses Moose because internally it relies on the Moose meta-object protocol for introspection.

Out of the box you get:

Reporting
Parallel tests (which optionally accept a custom schedule)
Tagging tests (slice and dice your test suite!)
Test inheritance (xUnit for the win!)
Full Moose support
Test control methods (startup, setup, teardown, shutdown)
Extensibility
All the testing functions and behavior from Test::Most

To learn about xUnit testing in Perl, you may wish to read a five-part tutorial I published at Modern Perl Books:

That tutorial is slightly out of date (I wrote it in 2009), but it explains effective use of Test::Class and some common anti-patterns when using it.

About The Author

For those of you who may be reading this and are not familiar with me, I am Curtis "Ovid" Poe. I authored the test harness that ships with the Perl programming language. I wrote the well-reviewed book Beginning Perl and am one of the authors of Perl Hacks (how's that for a redundant title?). I also sit on the Board of Directors of the Perl Foundation and am one of the people behind All Around The World, a company offering software development, consulting and training.

If you'd like to hire me to fix your test suite or write software for you, drop me a line at ovid@allaroundtheworld.fr.

Managing a Test Database

Ovid — Mon, 22 Mar 2021 13:36:07 +0000

The Problem

Test databases are very easy to get wrong. Very easy. Decades ago when I first learned testing, the team shared a test database. If you ran your test at the same time another developer, both of your test suites would fail! However, we were using a database where we had to pay for individual licenses, so we were limited in what we could do.

Later, I worked for a company using MySQL and I created an elaborate system of triggers to track all database changes. This let me “fake” transactions by starting a test run, see what had changed last time, and automatically reverting those changes. It had the advantage that multiple database handles could see each other’s changes (hard to do for many databases if you have separate transactions). It had the disadvantage of everything else: it was fragile and slow.

Later, I started using xUnit frameworks, eventually writing a new one that’s popular for companies needing a large-scale testing solution. With this, it was easy for each test class to run in a separate transaction, cleaning itself up as it went. Using transactions provides great isolation, leverages what databases are already good at, and let’s you run many classes in parallel.

But it can easily break embedded transaction logic. And you have to guarantee everything shares the same database handle, and you can’t really test the transactions in your code, and, and, and ...

What finally drove me over the edge was writing some code for a client using the Minion job queue. The queue is solid, but it creates new database connections, thus ensuring that it can’t see anything in your database transactions. I figured out a (hackish) solution, but I was tired of hackish solutions.

While I was researching the solution, Matt Trout was reminding me (again) why the “database transactions for tests” approach was broken. Just spawn off temporary test databases and use those, throwing them away when you’re done.

The Client

A company wanting to hire me gave me a technical test and there was a task to add a simple feature to a Catalyst web application. It was trivial. They handed me a Vagrant file and after a quick vagrant up and vagrant ssh, I was ready to begin. Then I looked at the test they had for the controller:

use strict;
use warnings;
use Test::More;

use Catalyst::Test 'Client';

ok( request('/some_path')->is_success, 'Request should succeed' );
done_testing();

The task involved a POST to a URL. There was no test for the existing feature that I was adding to, but any test I wrote meant I’d be changing the state of the database. Run the code multiple times and I’d leave junk in the database. There were various ways I could approach this, but I decided it was time to build a quick database on the fly, write to that, and then dispose of it after. The code for this was trivial:

package Test::DB;

use File::Temp qw(tempfile);
use DBI;
use parent 'Exporter';
use Client::Policy;

BEGIN {
    if ( exists $INC{'Client/Model/ClientDB.pm'} ) {
        croak("You must load Test::DB before Client::Model::ClientDB");
    }
}
use Client::Model::ClientDB;

our @EXPORT_OK = qw(test_dbh);

my $connect_info = Client::Model::ClientDB->config->{connect_info};
my $dsn          = $connect_info->{dsn};
my $user         = $connect_info->{user};
my $password     = $connect_info->{password};

# $$ is the process id (PID)
my $db_name = sprintf 'test_db_%d_%d', time, $$;
my ( $fh, $filename ) = tempfile();

my $dbh = DBI->connect(
    $dsn, $user, $password,
    { RaiseError => 1, AutoCommit => 1 } 
);
$dbh->do("CREATE DATABASE $db_name");
system("mysqldump -u $user --password=$password test_db > $filename") == 0
  or croak("mysqldump failed: $?");
system("mysql -u $user --password=$password $db_name < $filename") == 0
  or croak("importing schema to mysql failed: $?");

# XXX We’re being naughty in this quick hack. We’re writing
# this back to the Model so that modules which use this connect 
# to the correct database.
$connect_info->{dsn} = "dbi:mysql:$db_name";

# This is just a quick hack to get tests working for this code.
# A catastrophic failure in the test means this might not get
# run and we have a junk test database lying around.
# Obviously we want something far more robust

END { $dbh->do("DROP DATABASE $db_name") }

sub test_dbh () { $dbh }

1;

The above is very naughty in many ways, but the client hinted that how fast I returned the test might be a factor (or maybe they didn’t and I misread the signals). They also made it clear they were looking at how I approached problems, not whether or not the code was perfect. Thus, I thought I was on safe territory. And it meant I could do this in my test:

use strict;
use warnings;
use Test::More;
use lib 't/lib';
use Test::DB;

use Catalyst::Test 'Client';

ok( request('/some_path')->is_success, 'Request should succeed' );

# anything I do here is against a temporary test database
# and will be discarded when the test finishes

done_testing();

The Test::DB code was quick and easy to write and made it trivial for me to safely write tests. I was pleased.

What’s Wrong With Test::DB?

For a junior developer, Test::DB might look awesome. For an experienced developer, it’s terrible. So what would I do to make it closer to production ready?

Here are just a few of the things I would consider.

Stronger Data Validation

First, let’s look at our connection information:

my $connect_info = Client::Model::ClientDB->config->{connect_info};
my $dsn          = $connect_info->{dsn};
my $user         = $connect_info->{user};
my $password     = $connect_info->{password};

The above relied on how Catalyst often sets up its DBIx::Class (a Perl ORM) model:

package Client::Model::ClientDB;

use strict;
use base 'Catalyst::Model::DBIC::Schema';

__PACKAGE__->config(
    schema_class => 'Client::Schema::ClientDB',
    connect_info => {
        dsn      => 'dbi:mysql:test_db',
        user     => 'root',
        password => 'rootpass',
    }
);

Once you load that class, you get a config class method which can tell you how that class is configured. However, there’s no guarantee in the Test::DB side that the data is structured the way that I expect. Thus, I need to validate that data and throw an exception immediately if something has changed.

And how do we create our test database?

$dbh->do("CREATE DATABASE $db_name");
system("mysqldump -u $user --password=$password test_db > $filename") == 0
  or croak("mysqldump failed: $?");
system("mysql -u $user --password=$password $db_name < $filename") == 0
  or croak("importing schema to mysql failed: $?");

The CREATE DATABASE command is fast, so I’m not worried about that. And the test had a single table with very little data, so this was quick. But for Tau Station, we have a couple of hundred tables and tons of data. This would be slow. For any reasonably mature system, dumping the database each time would be a bad idea. There are also ways you could easily avoid dumping it multiple times, but that hits the next problem: adding that data to your new test database. That would need to be done for each test and that is not something you can trivially speed up.

For a more robust system, I’d probably create a local database service that would simply build a set number of test databases and have them waiting. The test would request the next test database, the service would register that the database had been taken, and create a new test database in the background while your test runs. The service would also probably clean up old test databases based on whatever policies you think are appropriate.

No Action At A Distance

This line is terrible:

$connect_info->{dsn} = "dbi:mysql:$db_name";

The reason that works is because the config data in Client::Model::ClientDB is global and mutable and $connect_info is merely a reference to that data. Instead, if I have a "database service" that tells the code which database it can use, then Test::DB can call that service, and so can Client::Model::ClientDB. Everything relies on a single source of truth instead of hacking global variables and hoping you don’t mess up.

Don’t Drop The Test Database

If there is one thing which I hate about many testing systems, it’s a watching a test horribly fail, but the database is cleaned up (or dropped) and I can’t see the actual data after the test is done. What I often have to do is fire up the debugger and run the code up to the test failure and grab a database handle and try to inspect the data that way. It’s a mess.

Here, we can fix that by simply dropping this line:

END { $dbh->do("DROP DATABASE $db_name") }

At the beginning and end of every test run, we can diag the test database name and if I need to see if there’s an issue in the database, I can still use it. Our database service would have code to drop the database on:

The next day
The next test run
After exceeding a threshold of databases
... or whatever else you need

In short, keep the valuable data around for debugging.

Rapid Database Development

The database service solution would also have to tie into your database change management strategy. I heavily use sqitch to manage database changes and I’ve written a lot of code to support odd edge cases. It wouldn’t be hard to write code to let the database service see if it’s up to date with your local version of sqitch. Whatever database change management strategy you use, it needs to be discoverable to properly automate the database service.

Of course, you think, that’s obvious. Yet you’d be shocked how many times I’ve worked with clients whose database change management strategy involves listing a bunch of SQL files and checking their mtime to see which ones need to be applied to your database. Yikes!

Faster Tests

If this is done well, your tests should also be faster. You won’t have the overhead of transactions beyond what your code already has. Plus, you can avoid issues like this:

sub test_basic_combat_attack_behavior ($test,$) {
    my $ovid    = $test->load_fixture('character.ovid');
    my $winston = $test->load_fixture('character.winston');
    my $station = $test->load_fixture('station.tau-station');

    $test->move_to($station->port, $ovid, $winston);
    ok !$ovid->attack($winston),
      'We should not be able to attack someone on the home station.';
    ...
}

In the above, we’re loading some fixtures. Sometimes those fixtures are very complicated and loading them takes time. For one client, when I would run $test->load_all_fixtures('connection');, that would add an extra couple of seconds to every test which needed to do that.

Instead, pre-built test databases can have the test fixtures already loaded. Further, having a pre-populated database helps your code deal with something closer to a real-world problem instead of dealing with an empty database and not catching corner cases that might cause.

Conclusion

By using a database service which merely hands you a temporary test database, you don’t have to worry about leaving the database a mess, managing transactions in tests, or having nasty hacks in your tests to workaround these issues. Most importantly, you’re not changing the behavior of your code. You just use the database like normal. It might be a little bit more work up front to create that database, but it’s worth the effort.

I really do want to get around to creating a proper database tool like this some day. Today is not that day. But I was delighted how even my quick hack, written in just a couple of minutes, made it so much easier to test my code. I should have done this ages ago.

Three-Number Project Management

Ovid — Fri, 19 Mar 2021 13:59:40 +0000

Congrats on Your New Role!

So there you are, a brand new project manager (PM), or product owner (PO) and ... wait a minute. What the heck's the difference? In many agile projects, there is no project manager and the PM's responsibility is distributed across the various members of the team: product owner, developers, ScrumMaster (Scrum), Coach (XP), and so on. Thus, the definition of a PM is sometimes muddied and in my experience, varies wildly across organizations.

But because the waters are often muddied here, I‘m going to handwave this entire issue and hope you don‘t notice. Instead, we‘ll focus on an age-old problem in the PM/PO space: what should we do next?

So ...

What Should We Do Next?

As a fledgling team lead, PO, PM, or whatever title your company‘s uses for person responsibile for getting stuff done, figuring out what to work on next is a daunting task. It involves having an in-depth knowledge of the project, the product, the business, and the team. And when you're juggling 27 things at once, it can be maddening to have a bunch of baby bird cheeping "what do I do next?", demanding you vomit up user stories for them to work on.

You might think that backlog grooming/refinement is all you need, but that still doesn't tell you, out of the 374 tasks that need to be done, which should be done next. Fortunately, there's a very simple way to do this, and that's to build a business case for each task, using just three numbers.

The Three Number Business Case

A business case is really nothing more than making an argument that a thing should or should not be done, but using numbers to back it up. If you want to learn more, I have a "rescuing a legacy codebase" talk that covers this.

But what numbers do we use? For this case, try to pick the three most important numbers that represent value in your tasks. At one point, when I had the project management role in developing Tau Station, a free-to-play (F2P) narrative sci-fi adventure, the complexity of the role was overwhelming, but each task's "three most important numbers" were clear: complexity, usability, monetization.

The complexity of a task is simply how many "story points" (or whatever you call them) that team members have assigned to a task. I recommend using Fibonacci numbers for story point estimates, especially for what I'm about to show you.

The usability of a task was how useful it was to the customers. Again, you want to use Fibonacci numbers here. You need a feel for what this number is because, in reality, you cannot get a perfect estimate. However, it's clear that a feature which lets customers download reports is more valuable to them than a feature which slightly adjusts the color of a sidebar, so the former might get the number "13" while the latter might get the number "1".

In the case of Tau Station, everyone is allowed to play the game for free, but as with many F2P games, there are monetization options in the game and things which increase the likelihood of monetization gets a higher number. A "Happy Hour" feature, where the players get extra rewards when they buy stuff gets a higher monetization number than a feature to make forum posts "sticky." Again, you're using Fibonacci numbers here.

Important: the three things you choose should reflect your business concerns and may very well be vastly different business requirements. You may choose "security" or "legal compliance" or something else entirely. Tau Station included these as part of the "usability" number. Think hard about these numbers because you don't want to get them wrong.

Once you have your three numbers, what next? Well, you need to assign weights to them. This will largely be a process of trial and error until it feels right. In our case, let's say that both monetization and complexity get a weight of 40 and usability gets a 20. Then you play around with formulae until you get a final score, with higher-scoring tasks having a greater priority. Here's the formula I used for Tau Station, though you will need to adjust this for your own needs:

pros  = monetization * 40 + usability * 20
cons  = complexity * 40
scale = 10
score = ( ( pros – cons ) + 500 ) / scale

Let's say a task has a monetization of 1, and a usability of 5, and a complexity of 7. How does that compare with a take of a monetization of 3, a usability of 3, but a complexity of 13? The first task might earn us less money and be less useful, but it's easy to implement. Decisions, decisions ...

Well, monetization and usability are both "pros" (benefits), and complexity is a "con" (drawback), so we have task 1:

pros  = 1 * 40 + 5 * 20
cons  = 7 * 40
scale = 10
score = ( ( 140 – 280 ) + 500 ) / 10

And task 2:

pros  = 3 * 40 + 3 * 40
cons  = 13 * 40
scale = 10
score = ( ( 240 – 520 ) + 500 ) / 10

Task 1 has a final score of 36, while task 2 has a final score of 22. Thus, we do task 1 first.

Once you learn to plug those three numbers into a spreadsheet, you can then sort the tasks by score and prioritize tasks at the top of the list.

Real World Usage

When I was doing this, I found it made my work much easier and the team appreciated it because it was clear direction. Plus, since the team provides the complexity estimates, they know that they have a real impact on project direction, something that teams usually appreciate.

I used an Excel spreadsheet, but also a small program that I wrote which fetched data from github and ZenHub to automate the creation of my spreadsheet. Every month I'd repeat the process, fill in missing numbers, sort the rows and share it with the team via a Google Doc. It worked very well and figuring out the project tasks for the month was no longer a stressful affair..

Note that sometimes you will find tasks which don't seem to be scored correctly with this technique. When that happens, investigate whether or not your three numbers seem reasonable. Maybe your weights are wrong, or your formula is off.

Or, as sometimes happens, there are external factors. A task may be complex, offer low usability, and no monetization potential, but another team is demanding that XML feed now and hey, you have to play nice with others. When odd events like that happen, use your judgment instead of your spreadsheet.

At the end of the day, this technique, like all others, isn't perfect. When used correctly, however, it's a great tool for building value and focusing on your real needs. Further, you prioritize greater value and, when asked to justify your decisions, you have numbers to back them up. Even if they're not perfect, quibbling over numbers and formulae instead of hunches is a much safer career move.

Fixing a 40-year-old Software Bug

Ovid — Tue, 16 Mar 2021 09:44:59 +0000

Note: I have exact numbers for this because I originally wrote this on the day I found the bug.

I was working on an ETL system designed to reduce the cost of Phase III clinical trials. In doing so, I was reading some data and I processed 36,916 potential dates. Two of those 36,916 failed to validate. I wasn't concerned as these dates came from clients at big pharmaceutical companies handing us spreadsheets the often had only a vague match with our specification. When you work with clients much larger than you, you often just grin and bear it and die a little inside each day (which reminds me, I need to write up my hell with Yahoo's now-extinct IDIF format some day).

On that day, however, the pharmaceuticals were blameless. When I inspected the raw data, the failed dates were January 1^st, 2011 and January 1^st, 2007. I knew those dates. This wasn't sloppy data from a client. I had a bug in software I had just written, but this bug was first released in 1983.

For anyone who doesn't understand the software ecosystem, this may sound mystifying, but it makes sense. Because of a decision taken a long time ago to make another company money, my client lost money in paying me to fix a bug that one company accidentally introduced and another company deliberately introduced. But to explain it I need to talk about a third company that introduced a feature that eventually became a bug, and a few other historical tidbits that nonetheless contributed to the obscure bug I fixed that day.

History

System Clocks

In the good ol' days, Apple computers would sometimes spontaneously reset their date to January 1^st, 1904. The reason for this is simple. Back then, Apple computers used battery-powered "system clocks" to keep track of the date and time. What happened when the battery ran out? Apple computers tracked their dates as the number of seconds since the epoch. In this sense, an epoch is merely a reference date from which we start counting and for Macintosh computers, that epoch was January 1^st, 1904 and when the system clock battery died, that was your new date and it caused a curious problem.

Back then, Apple used 32 bits (ones and zeros) to store the number of seconds from their start date. One bit can hold one of two values, 0 or 1. Two bits can hold one of four values, 00, 01, 10, 11. Three bits can hold one of eight values, 000, 001, 010, 011, 100, 101, 110, 111, and so on. How much can 32 bits hold? 32 bits can hold one of 2³², or 4,294,967,296, values. 2³² seconds is just over 136 years, which is why older Macs couldn't handle dates after 2040 and if your system clock battery died, your date would reset to 0 seconds after the epoch and you'd have to keep manually resetting the date every time you turned on your computer (or until you bought a new battery for your system clock).

However, the Apple solution of storing dates as the number of seconds after the epoch means we couldn't handle dates before the epoch and that had far-reaching implications, as we'll see. This was a feature, not a bug, that Apple introduced. It meant, amongst other things, that the Macintosh operating system was immune to the Y2K bug. (Ironically, many Mac apps weren't immune because they would introduce their own date system to work around the Mac limitations.)

Lotus 1-2-3

Moving along, we have Lotus 1-2-3, IBM's "killer app" that helped to launch the PC revolution, though it was VisiCalc on the Apple that really launched the personal computer. It's fair to say that if 1-2-3 hadn't come along, PCs would likely have not taken off as quickly as they had and computer technology would have turned out considerably differently. However, Lotus 1-2-3 incorrectly reported 1900 as a leap year. (In literary terms, that sentence is what we call "foreshadowing")

When Microsoft released Multiplan, their first spreadsheet program, it didn't have much market penetration. So when they conceived of Excel, they decided to not only copy 1-2-3's row/column naming scheme, they made it bug-for-bug compatible, including treating 1900 as a leap year, a problem that remains to this day. This wasn't to be sneaky; they needed Excel to be able to import Lotus 1-2-3 spreadsheets. So for 1-2-3, this was a bug, but for Excel, it was a feature, even if meant sometimes getting dates wrong.

Epochs

When Microsoft wanted to release Excel for Apple's Macintosh computers, they had a problem. As mentioned, Macintosh didn't recognize dates prior to January 1^st, 1904. However, Excel used January 1^st, 1900 as its epoch. So Excel was modified to recognize what the epoch was and internally stored dates relative to these respective epochs. This Microsoft support article explains the problem fairly clearly. And that leads to my bug.

My Bug

My client received spreadsheets from many customers. Those spreadsheets may have been produced on Windows, but they may have been produced on a Mac. As a result, the "epoch" date for the spreadsheets might be January 1^st, 1900 or January 1^st, 1904. How do you know which one? Well, the Excel file format exposes this information, but the parser I was using did not and it expected you to know whether you have a 1900 or 1904-based spreadsheet. I suppose I could have spent a lot of time trying to figure out how to read the binary format of Excel and sent a patch to the maintainer of the parser, but I had many other things to do for my client and so I quickly wrote a heuristic to determine whether or not a given spreadsheet was 1900 or 1904. It was pretty simple.

In Excel, you may have a date of July 5, 1998, but it might be formatted as "07-05-98" (the useless US system), "Jul 5, 98", "July 5, 1998", "5-Jul-98" or any of a number of other useless formats (ironically, the one format my version of Excel didn't offer was the standard ISO 8601 format). Internally, however, the unformatted value is either "35981", for the 1900 date system, or "34519", for the 1904 system (these numbers represent the number of days after the epoch). So I used a robust date parser to extract the year from the formatted date, and then an Excel date parser to extract the year from the unformatted value. If they're four years apart, I know I'm using the 1904 date system.

So why didn't I simply use the formatted date? Because July 5, 1998 might be formatted as "July, 98", losing me the day of the month. We get our spreadsheets from so many companies and they create them in so many different ways that they expect us (meaning me, in this case) to figure it out. After all, Excel gets it right, I should, too!

That's when 39082 kicked me in the tail. Remember how Lotus 1-2-3 considered 1900 a leap year and how that was faithfully copied to Excel? Because it adds an extra day to 1900, many date calculation functions relying on this can easily be off by a day. That means that 39082 might be January 1^st, 2011 (on Macs), or it might be December 31^st, 2006 (on Windows). If my "year parser" extracts 2011 from the formatted value, well, that's great. But since the Excel parser doesn't know whether it's a 1900 or 1904 date system, it defaults to the common 1900 date system, returns 2006 as the year, my software saw that the years were five years apart, assumed an error, logged it, and returned the unformatted value.

To work around this, I wrote the following (pseudo-code):

difference = formatted_year - parsed_year
if 0 == difference:
    assume 1900 date system
if 4 == difference:
    assume 1904 date system
if 5 == difference && 12 == month && 31 == day:
    assume 1904 date system

And all 36,916 dates parsed correctly.

As an aside, according to an anecdote from Joel Spolsky, the Lotus 1-2-3 "bug" may have been a deliberate attempt to simplify the Lotus software.