Sociable Steve

Posted on Nov 20, 2021

Let's talk quality - Part 2

#programming #productivity #computerscience

In my previous post I talked about the different aspects of a more fully rounded view of quality beyond the standard code quality. In this post I'll be starting to talk about how to measure quality.

Each of the three areas I previously defined is a large area in their own right, and so I'm going to talk about only one of those areas in this post; Development Quality.

What is Development Quality?

As a reminder, development quality is about measuring and understanding the effect of development activities on quality. We won't focus on running software or if it meets the users needs here.

My general definition here is:

Development Quality is about tracking the impact on quality of activities that occur before the system is put into operation, or released to the user

What does Development Quality cover?

Certain aspects of the development quality space is the most understood area of the entire quality space, since it's where engineers have spent most of their time and focus. There is a tendency to focus on what can be measured automatically and often, and getting feedback from the people who this impacts the most gets less focus.

The list of things to consider here could be endless, but I break it down into a small number of spaces to help manage things. What I consider to be within the realm of Development Quality are:

Writing and Maintaining code

Writing code, and looking after it, is the core of the entire development effort, and so has the largest direct impact on the quality of a system. It should therefore come as no surprise that this is the most well defined and supported are of quality across the entire software development industry.

Testing software

A core part of any good Quality program is testing. The act of ensuring that what has been built does what it's meant to, and does it well. Testing here covers all the aspects of testing; Unit testing, Integration testing, Usability testing, Load testing. If you don't have a good testing strategy, and good coverage, then quality will be affected because problems won't be caught.

Developer and Tester Experience

Working on a system, either developing or testing it, is an experience. If the experience is a bad one, then people will want to do the quickest job possible and get out of there, leading to poor quality being introduced. There is also the additional complication of having to spend time dealing with a poor experience, rather than working on the task at hand, in an environment where time is already at a premium, which is most workplaces.

Ensuring that developers and testers have a good experience means that they will be more able to focus on the job at hand, and more likely to try and maintain that good experience and so keep quality higher.

Communication and Understanding

Understanding how a system works, the changes that are happening to it, and the problems that are encountered are all key in ensuring that quality is maintained and improved on a system.

Someone approaching a system they don't understand means that they are likely to add code which isn't right for the system, introducing technical debt and reducing quality.

People who don't understand changes they are reviewing are more likely to simply sign off on a change, fearing that they might look stupid for asking questions that they consider obvious to other people.

Testers who don't understand what a change is for won't be able to properly test the change, because they aren't aware of the impact and edge-cases around it.

Ensuring that there is good documentation and good team communication is an often over-looked part of any system maintenance, instead favoring doing over documenting, but as new team members join and old ones leave, the understanding of the system deteriorates without having good quality documentation in place.

What are some metrics around Development Quality, and how can we measure them?

There are a whole plethora of metrics we could measure to understand the quality of our code, and if we're doing things right. I've picked a few which if measured could have a big impact in a short time.

Cyclomatic complexity

A measure of how many paths there are through the code. The more paths there are, the more complex the code, and the more opportunity for errors to be introduced.

While people are working on complex pieces of code it can be easy for them to miss edge cases, or lost their train of thought if interrupted. Simpler code is easier to understand, and consider when changing or testing, and so it is easier to identify bugs earlier.

There are plenty of tools out there to help measure this. My own personal go-to in this space is SonarQube.

Code duplication

If there are blocks of code that are duplicated around the code-base then when it comes time to update those blocks of code, for example to introduce a new feature, or fix existing bugs, it's easy for similar blocks of code to be missed.

Personally I use the rule-of-three for many things, and in this instance once you end up with the third usage of something, it should be in a centralised method rather than duplicated around. Purists might tell you that you should move to a single place as soon as it's re-used, on the second instance, but there is a trade-off between developing at speed, and developing to perfection.

My go-to in this space is SonarQube, however there are other tools which can help with this.

Dependency management

It's no secret that within a short time of release of any software, it's already out of date. New vulnerabilities are found in third-party packages all the time, and security protocols are constantly updated. Performance patches are something to consider as well.

Ensuring we look after the dependencies helps us when it comes to larger upgrades. Rather than having to fight with the plethora of minor updates AND the major changes, the upgrades become far less onerous to do. Constant updating should be a part of Business As Usual (BAU) activities.

There are a plethora of tools in this space, and my recommendation is to use the existing one in any platforms you currently use where they exist, for example GitHubs Dependabot. If you don't have one available for your current tool-chain then (Snyk)[https://snyk.io/] is a good tool with a low barrier to entry.

Test Coverage

When asking most people they are likely to say that test coverage is related to the amount of code covered by automated tests, however I disagree with this viewpoint. We build systems to meet user needs, and so we should be testing that the system we have built meets those needs. Test coverage in this scenario is about how many use-cases and edge-cases are covered by testing. Using this as a measure meets our definition of high quality systems more completely, specifically the part that states a high quality system meets the needs of the user.

That is not to say that measuring code coverage by automated testing doesn't hold merit, but the role of those metrics are more to ensure that we aren't introducing bugs at a smaller level, rather than ensuring we are meeting the users needs.

Measuring test coverage in this sense isn't an easy one to do, and may be more of a manual effort. I've been familiar with a few options in this space but none really stand out. My go-to here is (TestRail)[https://www.gurock.com/testrail/] for documenting use-cases and testing around those cases but that's mainly because of familiarity.

Most people will find that this is a bit of a mental shift in how they approach testing, but it'll be worth the effort if it's improving the quality of the system overall, and everyone can understand what the use-cases of the system are just by looking through the test cases that are available.

Code Review Quality

Code reviews are an important part of any development process, and are a learning opportunity for everyone involved. They are the first point at which people can get feedback from their peers about the quality of their work, and how they can improve.

Most times these are done by opening a pull-request (PR) on your platform of choice, and then an asynchronous review is performed. Understanding if a review is of high quality is difficult, and instead I offer a few insights into how to spot a low quality review.

The first thing to watch out for is PRs that are large. People don't have time, or want to spend the time, reviewing large pull requests, and so are more likely to accept lower quality than they would if the PR was smaller.

Next up is how much are people communicating about a review? If your PRs never have any comments then people aren't really reviewing them, and nobody is learning. Conversely if people are spending a long time commenting on a PR, then they aren't necessarily communicating effectively; Should the comment thread be moved to an in-person chat? A large amount of communication is non-verbal/textual, and that is lost in translation on large threads.

Finally how long does it take for reviews to take place? If it's hours or days that's probably fine, but if it's any longer than the original context may have been lost from the mind of the implementer, and so it becomes harder to understand and communicate about the intent of the change.

I don't have much in the way of tooling to measure this metric at the moment, but you can probably get the information out of any API that your platform of choice has. Additionally, asking people how they feel about the review process is important to understanding if it's a sensible process that is talking about the right thing. When asking questions then my advice is to do so as a rating on a scale (e.g. 0 - 10), which then allows you to translate it to a data point which can be measured over time.

Production Bugs

A great measure of the quality of what we're outputting during the development cycle is how many bugs there are. More bugs means lower quality. Teams who are constantly delivering without fixing bugs or paying down tech debt will continue to introduce more bugs.

Most teams measure tasks in some kind of task tracking software like Jira, and these tools let you define what type of task something is. Using these tools to understand the number of bugs will help understand where the system is in terms of usability.

As with all metrics the output from this must be understood in the wider narrative of the system. For example a system which has 10 bugs opened a week but is doing 1000 releases a week is in a far better state than one which has 10 bugs opened a week but is only doing 1 release a month.

Measuring this should be a matter of using the API of the task tracking software you use and mapping that against the amount of change the system undergoes. The exact measure of change can vary, but may be related to the number of user stories completed, the number of releases, or the number of changed lines of code.

Pre-Production Bugs

Catching bugs before they get to production is the most ideal time to catch a bug, and using this as a metric helps to understand several things.

Firstly if this metric is far lower than the Production Bugs metric, there's an indication that the testing process is failing in some way. This is not uncommon, especially if QA and production environments are severely different, or the testing team is inexperienced.

Secondly this metric can be a good indication of if the development team is producing good quality output before testing. Are the reviews working well? Do engineers understand the system? A low quality system, as previously described, is likely to continue being a low quality system, and pre-production bugs caught is a good indication that developers don't want to be working on this system, or that they're constantly battling the tech debt on the system rather than implementing high-quality changes.

There are two ways you can measure this. Firstly, as with production bugs, usage of any task-tracking software can help. Tagging bug tickets with 'pre-production' lets you query for those tasks and get relevant metrics from them.

If you don't have that option, then measuring the life-cycle of a task is another option. One which goes back from testing to development is a great indication that pre-production bugs were found. The exact option you choose will depend on how you work.

Developer and Testing Experience

As eluded to previously, a poor experience for developers and testers will lead to lower quality work. This is either because people are using their time overcoming technical barriers rather than doing quality work, or that they are in a bad environment and want to get out of it as quickly as possible, not spending a good amount of time ensuring quality.

There are a lot of different components to helping create better experiences for developers and testers. I've managed to list a few below:

Getting the code
Updating the system
Testing the change
Deployment the system
Finding help when required
Updating documentation

Understanding someones experience is difficult, if not impossible, to measure in an automatic way, and so we need to be asking the engineers and testers directly. In order to turn questions into a measurable metric, the questions need to be asked as a scale question, rather than an open-ended question.

Some questions that could be asked (please feel free to ignore some, or add your own):

How easy for you to get the code for the system?
- Finding in source code repo server
- Downloading the code
- Accessing the repo
How easy was it to get started with development?
- Setting up local dependencies
- Required environment variables
How helpful was the documentation?
- Was it easy to find?
- Was it up to date?
- Did it make sense?
- How easy was it to update?
If you needed help, could you find someone with knowledge of the system?
- Were core contributors documented?
- Did you know how to contact any of the core contributors?
How confident were you when making changes that you weren't breaking things?
- Is the system architected in a way that makes it easy to understand how things interact?
- Were there sufficient unit and integration tests to give you confidence?
Was there documentation about how to contribute to the system?
- Did you understand any standards being followed?
- Did the PR process make sense?
- How long did it take to get feedback on any PRs?
- Was the feedback on any PRs appropriate and constructive?
- Was the branching strategy clear?
Was is clear how to deploy your changes?
- Was the deployment strategy clearly defined?
- How easy or difficult was it to get your changes deployed?
Were you able to validate your changes in a testing/QA environment completely before deploying to production?
- Is there a pre-production environment?
- Are any pre-production environments sufficiently similar to production that testing gives enough confidence that changes work as expected?

Asking questions is important, but you need to balance that with not asking too many questions, and not too often. People are happy to click a smiley face to give feedback, but not many people will fill in a 20 page questionnaire after every change. The frequency and measure of this metric is one which will need to be tuned and changed over time.

Summary

In this blog we've covered one of the three areas of Quality of an Engineered Software System. There's a lot here, and you shouldn't expect to measure everything at once, from day one. Cherry picking what you can do easily to make a start, completely excluding things which don't make sense, or adding in your own metrics are all very valid approaches.

The important point to take away is that you should be measuring things to know where you need to improve, and that Development Quality is about more than just code and more than just what you can measure automatically.

In the next blog I'll start to investigate the Operational aspect of quality.

DEV Community