I am starting this miniseries which is a continuation of my previous article on why I think legacy codebase should be tested. I will be more general in this part, but later I will be mostly writing about practical tips when working with Java codebase.
The intentions are clear, you should be able to benefit from this, if these points sound familiar:
- you are maintaining a large codebase
- you are in a process of refactoring old components (rewriting logic, upgrade)
- you have no access to code history (original developers not present in the company, VCS history is unusable or repository was migrated, etc.)
- you have no documentation, or it is outdated
In a situation like this you need to have systematic approach. First of all I recommend installing any of the tools for static code analysis. I cannot really tell why, but most legacy projects I've helped maintain/improve suffered from huge technical debt. In other words, code quality was some abstract term and static analysis was completely missing. I have found useful installing dedicated instance of SonarQube, that would be accessible to your team. Setting up quality gates should be individual, but I strongly recommend leaving at least standard configuration enabled.
Why is this important?
This analysis should give you an insight in what is actually wrong with the code and where potential issues may occur. You can then create test scenarios targeting these issues. For instance potential
NullPointerException or issues with incorrectly closed streams. At this point, you should note them as some red flags which will point you to the part of system, that should be handled with higher priority.
SonarQube can give you a pretty thorough report on coverage, so if your goal is to actually find good testing candidates this is the place to start. It is also super easy to setup and benefits are clearly visible also for non-tech people (charts and such).
You will also need to invest some effort in individual analysis. These are my usual observations about legacy code:
- code is not unit testable (large methods - no units, classes take too many responsibilities, dependency injection is missing, complex class hierarchy, etc.)
- business logic is complicated and requires a lot of test data preparation
- external libraries or 3rd party applications are required to perform certain operations (external API, database, etc.)
These all are potential deal-breakers when you try to use simplistic approach. All of the problems above can be mitigated at some cost, but at this stage, you should focus on what information you have about system you want to test.
Whether it is a web application or an integration project, you should prepare some scenarios, that would copy real-life usage. E.g. if the implementation is not clear, it should at least be clear what the system does overall.
For example, if the application exposes public APIs, start with writing API tests. If it is a backend "data-processing" system, try to come up with possible test/expected data, and so on. Usually you should look for something I like to call the least obscure blackbox. It is a component with known functionality, but unknown implementation.
Why is it important?
You won't achieve much success without this kind of analysis. Testing something mechanically may not discover problems in implementation, since you can end up with tests that test the wrong implementation. If it is clear, you should always go with scenarios.
Depending on your results, you should be able to come up with these isolated cases:
- functionality to be tested (formulated as task)
- current test coverage (generated report)
- possible errors in implementation (summary from static and individual code analysis), that will help prioritize case
- individual remarks, depending on available knowledge (importance, frequency of usage, external dependencies, etc.)
With these cases prepared, you can start implementing the tests. In the next articles, I will target specific problems and possible solutions.