Quality of engineered software systems is something which is commonly overlooked by non-technical decision makers in many companies, of every size. There are two main reasons behind this:
- Engineers don't know how to discuss the problem with non-technical stake-holders
- There is a lack of data about how quality issues affect engineering efforts in a way that non-technical decision makers can understand
This is the first part of a series about how we can understand and talk about quality in a way that our non-technical colleagues can understand.
The current landscape
Since software quality is inherently something seen and driven by engineers, the current approach to quality is to look at the problem in terms of what can be measured and understood automatically, and reported on quickly. The reports and information that are produced are technically focused, and offer little in the way of understandable reporting for the non-technical stake-holders.
Tools like SonarQube, Snyk, CodeClimate, and other static analysis tools do a great job at letting software engineers know where problems exist. Tools like prometheus, NewRelic, and the ELK stack help operations understand the current issues in a running stack.
Beyond code quality
Over the past few months, and even years, I've been contemplating the scope and size of quality for engineered software systems. The first step is some kind of definition:
A high quality software system is one that is easily maintainable, scale-able, extendable, and meets the needs of the user.
For a single sentence, this definition covers a wide range of activities, and we still need some way to further define how we measure and report on quality. This leads me to three broad categories of measures:
Development
The Development measure covers metrics related to the development process of the software system. This isn't just the code, but the quality of activities around the code as well, such as code reviews, QA activities, and developer experience.
Some example measures:
- Cyclomatic Complexity
- Code Duplication
- Test Coverage
- PR Review Quality
- Production Bugs
- Pre-production Bugs
- Dependency Management
- Developer Experience
- Documentation
- Easy to get started
- Overly monolithic / Hard to navigate
- Ease of testing
Operational
The Operational measure covers metrics related to how the software runs in whatever environment is ends up in. This covers things like resource usage, scalability, observability, and issue recovery.
Some example measures:
- Resource usage
- CPU
- Memory
- Disk
- Network
- Logging frequency
- Overall log lines
- Log lines per request
- Log relevance
- Observability
- Time to understand where an issue is occurring
- Ability to understand usage patterns
- Scalability
- How easy can the system be scaled?
- Alerting
- Time to know when there is an issue
Sentiment
While we can measure how the operational and development teams are doing, there is also a reality that software systems are developed to fix a problems. Understanding the sentiment of the systems consumers (users, customers, etc.) is also a great indicator of the overall quality of a system.
Some example measures:
- Does the system meet the needs of the user?
- Is the system easy to use?
- Is the system performance sufficient?
- Do you worry when the system is included in product development?
Summary
By considering quality as something more than just how well the code is written we can start to get a better understanding of quality from many different angles, and start to understand where we need to focus our efforts.
In the next post I'll cover how we can start measuring these metrics, before going onto reporting!
Edit:
The second part of this series is now live over here
Top comments (0)