Software Quality (8 Part Series)
Yes, I know: that title. Still, cyclomatic complexity is a subject every developer and tester should understand, no matter how nerdy the name is. Why? Because bad code encourages bugs to spread and multiply. This leads to bugs waiting to surprise hapless users who tested something you didn’t.
Before I get into what cyclomatic complexity is, let’s talk for a second about maintainability and coding practices.
If you ask me what I look for in code for maintainability, I’ll tell you a number of things, from single responsibility principle to documentation, to parameter count, and more. But what I’ll often talk most about is size.
Specifically, I encourage class files to be 250 lines long at most and methods to be 20 lines long max.
Why? It’s not just because I enjoy the tears of new developers; It’s because I’ve learned from monstrosities I’ve built in the past.
What I’ve learned is this:
Large methods are an open invitation to bugs to move into your codebase. Large classes are a broken window telling you that the normal rules of software architecture do not apply to this file. Both of these things are extremely bad for software maintainability and quality.
In this article, I want to delve into that method size limit and explain what I’m talking about. In order to do that, we need to talk about the elephant in the room: Cyclomatic Complexity.
Put simply, Cyclomatic Complexity is the number of unique paths through code.
Take the following code as an example:
This method is fairly simple, but still has a number of unique paths through it:
- If the actor is not a player, the routine will proceed straight to line 15.
- If the actor is a player and the object the actor is attempting to interact with is corrupted, the system will use the message at line 7.
- If the actor is a player but the object is not corrupted, the message at line 11 will be used.
In all cases, line 15 will return false.
Let’s look at this a different way – as a graph:
In this case, we can represent our code as a graph of a number of different code blocks where we can count the number of connections (edges) between blocks (nodes).
This can be expressed mathematically using the formula:
cyclomaticComplexity = edges - nodes + 2
Note: If you’re curious about this formula or where the +2 came from, check out the very detailed Wikipedia Article on the subject.
In our case, there are 5 nodes and 6 edges, yielding a cyclomatic complexity of 3.
Cyclomatic complexity is like a golf score: lower is better. A cyclomatic complexity of 3 is very manageable and within the maximum recommended limit of 10.
Now let’s look at some more complex code:
In this code taken from a roguelike game engine, we’re handling spreading corruption from one area to another based on predefined rules mixed with some randomness. It’s not a lot of code at 32 lines, but it is becoming difficult to take in at once.
Let’s take a look at this code as a graph:
Right away the return statement at line 18 jumps out at you. Control flow statements like return, break, and continue all play into the cyclomatic complexity of a menu and need to be accounted for.
Note also that we include loops with conditionals. Even though each iteration of the loop may or may not go into the block, we still account for the conditional only once.
All told, we have 10 nodes and 14 edges, leading to a cyclomatic complexity of 6 (14 – 10 + 2).
While the cyclomatic complexity of the previous method is still below 10, the complexity is still clearly approaching more dangerous to manage levels.
I’m not saying that you can’t fit all of this method into your head in one read, only that it’s harder to do so than it would be with a smaller method.
Because it’s harder to fit things into your head, it’s now easier to forget to test things when you’re also thinking about other portions of code. This larger size and additional complexity leads to areas where bugs can hide and things you didn’t remember to check exist.
Because of this and other weaknesses, various people have come up with metrics around how much cognitive complexity code places on the reader.
What do I mean by weaknesses? Well, let’s take a look at a relatively simple block of code:
This is a trivial read and understand. The code also little chance of breaking during maintenance. However, let’s look at the code’s graph and cyclomatic complexity:
Holy compiler, Batman! I count 6 nodes and 8 edges in 8 lines of code! That results in a cyclomatic complexity of 4 (8-6+2). In other words, no matter how simple the switch case, we’re still adding 1 to the cyclomatic complexity.
So, clearly, there’s a difference between testability which focuses on distinct paths through a system and cognitive complexity.
While there are a wide variety of cognitive complexity formulas out there, my favorite that I have worked with isthe implementation used by the SonarQube source code analysis tool.
Let’s talk about that 20 lines of code maximum method size I mentioned earlier. While this is still more of a guideline than an absolute rule, it is one based on a combination of cyclomatic complexity and cognitive complexity.
If your code is restricted to 20 lines long, it is unlikely to have many different branching statements aside from a simple switch statement. Similarly, 20 lines is brief enough for even the busiest developer to quickly grasp what a method is all about.
This number is one that I’ve found works well for me and my teams when working with C# or TypeScript code. It serves as a tradeoff between maintainability / quality and productivity as something reasonably easy to follow.
This number might vary based on your team or your language, but I encourage you to find something that works and go with it.
Ultimately, when we write code for maintainability and long term quality, we’re writing with complexity in mind. By reducing our cyclomatic complexity, we make it harder for bugs to live in our code and that helps us along the journey to make defects impossible.