Testing legacy code, part 1: How to start

#testing #java #legacycode #refactoring

I am starting this miniseries which is a continuation of my previous article on why I think legacy codebase should be tested. I will be more general in this part, but later I will be mostly writing about practical tips when working with Java codebase.

The intentions are clear, you should be able to benefit from this, if these points sound familiar:

you are maintaining a large codebase
you are in a process of refactoring old components (rewriting logic, upgrade)
you have no access to code history (original developers not present in the company, VCS history is unusable or repository was migrated, etc.)
you have no documentation, or it is outdated

How to start?

Static code analysis

In a situation like this you need to have systematic approach. First of all I recommend installing any of the tools for static code analysis. I cannot really tell why, but most legacy projects I've helped maintain/improve suffered from huge technical debt. In other words, code quality was some abstract term and static analysis was completely missing. I have found useful installing dedicated instance of SonarQube, that would be accessible to your team. Setting up quality gates should be individual, but I strongly recommend leaving at least standard configuration enabled.

Why is this important?

This analysis should give you an insight in what is actually wrong with the code and where potential issues may occur. You can then create test scenarios targeting these issues. For instance potential NullPointerException or issues with incorrectly closed streams. At this point, you should note them as some red flags which will point you to the part of system, that should be handled with higher priority.

SonarQube can give you a pretty thorough report on coverage, so if your goal is to actually find good testing candidates this is the place to start. It is also super easy to setup and benefits are clearly visible also for non-tech people (charts and such).

Individual code analysis

You will also need to invest some effort in individual analysis. These are my usual observations about legacy code:

code is not unit testable (large methods - no units, classes take too many responsibilities, dependency injection is missing, complex class hierarchy, etc.)
business logic is complicated and requires a lot of test data preparation
external libraries or 3rd party applications are required to perform certain operations (external API, database, etc.)

These all are potential deal-breakers when you try to use simplistic approach. All of the problems above can be mitigated at some cost, but at this stage, you should focus on what information you have about system you want to test.

Follow the thread

Whether it is a web application or an integration project, you should prepare some scenarios, that would copy real-life usage. E.g. if the implementation is not clear, it should at least be clear what the system does overall.

For example, if the application exposes public APIs, start with writing API tests. If it is a backend "data-processing" system, try to come up with possible test/expected data, and so on. Usually you should look for something I like to call the least obscure blackbox. It is a component with known functionality, but unknown implementation.

Why is it important?

You won't achieve much success without this kind of analysis. Testing something mechanically may not discover problems in implementation, since you can end up with tests that test the wrong implementation. If it is clear, you should always go with scenarios.

Putting it all together

Depending on your results, you should be able to come up with these isolated cases:

functionality to be tested (formulated as task)
current test coverage (generated report)
possible errors in implementation (summary from static and individual code analysis), that will help prioritize case
individual remarks, depending on available knowledge (importance, frequency of usage, external dependencies, etc.)

With these cases prepared, you can start implementing the tests. In the next articles, I will target specific problems and possible solutions.

Top comments (8)

Hila Berger • Jan 29 '18

Great article!
When testing your legacy code, do you prefer using a powerful mocking framework or performing more integration tests?

Pavol Rajzak • Jan 31 '18

Hi! Thank you for response. Well in general I always try to create a scenario that uses real world data. E.g. either creating end-to-end test for specific component, or integration test that would connect to development infrastructure.

But there are also times, where mocking is required, but this is mostly connected to testing specific corner cases that are hard to re-create. It is always something that must be approved within the team.

Hila Berger • Jan 31 '18

Thanks for your answer!
I agree - sometimes mocking is necessary, but not all frameworks give you the ability to do what you need...
Do you know any good mocking frameworks?

Pavol Rajzak • Feb 1 '18

Well for Java we mainly use frameworks/libraries that I mentioned in my next post, e.g. Mockito /w PowerMock, WireMock, etc. and yes, there's no silver bullet to everything, so you need to find the ones for your use case.

Also, the more "advanced techniques" you are using, the more you need to be cautious about your test setup, since a lot of frameworks that have their own classloader or use bytecode manipulation will interfere with each other.