Mercedes Bernard

Posted on Jul 4, 2020 • Edited on Jul 6, 2020 • Originally published at mercedesbernard.com

Legacy application code review - Part 1: The big picture

#codequality #codereview #legacycode #estimating

This post is the first part in a series about what to look for when inheriting a legacy code base. This post will focus on looking at the big picture of the application and how the system fits together.

Legacy code bases get a bad rap, but they can be even more fun than working from a clean slate. Not only do you have to make decisions that are extensible in the long-term, but you do that within the delicate balancing act of not breaking what is already there.

Since there are more dependencies and considerations to take into account when working with an existing application, I recommend doing a full code review and audit of the application before making any plans for future functionality. Having as much knowledge as possible will help you make informed decisions about the best path forward.

In this post, I outline a variety of things I recommend investigating during this code review, what you learn from those findings, and how to adjust your recommendations and estimates.

Does it run?

Before you begin your code review, the very first thing to try is to run the application locally. If you’re unable to get the application running within an hour or two there may be a bigger issue hiding. And it’s not worth your valuable time to debug setup issues.

If you can’t get the code running following the documentation or standard industry practices, reach out to your stakeholders or previous developers. If the answer is that the code doesn’t run, then you should start having a larger conversation about stabilizing the development environment before even considering trying to estimate future functionality. It’s not prudent to estimate how much it will cost to build an addition on a house when the foundation is crumbling.

Language/framework audit

Once you have the code base running, check the language and framework version(s) of the application to determine its current support status. Running old versions of languages and frameworks can open the application up to unfixable bugs or unpatched security vulnerabilities. Using out-of-date versions may also limit your ability to deliver on big ideas if your stakeholders want modern features that aren’t possible with the old framework version they have.

Maintainers will deprecate older versions and no longer release updates after a certain date. Whether or not open source, these support timelines and policies are usually published publicly so you can plan your upgrades to stay ahead of deprecation and EOL (end of life).

For example, here’s a sample of open source and proprietary languages and frameworks just to give you an idea of what to expect.

Infrastructure audit

After you audit the backbone of the application, look at the supporting infrastructure to catalog the versions, support, and necessary dependencies. Pay special attention to the database but also look at other infrastructure driving the application such as storage services, caches, message brokers, etc. As before, you are auditing versions and if they are currently supported.

While it’s not required, I recommend drawing a diagram for yourself and your stakeholders showing how these pieces fit together. This is a brilliant place to add value as an engineer because often, no one understands the complete picture.

A diagram can help you technically by serving as a reference to the architecture of the system when you need to evaluate future services to integrate with. It can help your stakeholders gain an awareness of the complexity of their system, which helps form useful context when you propose a higher than expected estimate. Sometimes there’s even an added, immediate benefit of eliminating overhead by showing them they have unused services they are paying for that can be stopped!

While evaluating the infrastructure, make note of any interesting dependencies and considerations. For example, if you discover an Oracle database hooked up to the Rails app, do a little extra research to look into the ActiveRecord adapter for Oracle and document any limitations since Oracle isn’t a standard choice for a Rails app.

By paying attention to what infrastructure is there and how it’s configured, you can note any architectural considerations you may have when estimating and planning your project.

Environment audit

The next step in your code review is to learn about the various configured environments and their automated continuous integration and continue delivery (CI/CD) pipelines or lack of. Understanding the environments and how they fit together is key to planning your developer workflow and productivity and determining quality assurance (QA) and user acceptance testing (UAT) processes.

If there are no automated CI/CD pipelines configured, include time for setting those up in any estimates you provide. Automated CI/CD pipelines increase developer productivity by automating tedious parts of your work; decrease broken builds/deployments which limits regressions; increase code quality by enforcing that your code meets certain criteria before being deployed; and shortens time to market for your stakeholders by making it easier to deploy code early and often.

Data model and database design

It’s helpful to understand the project domain you’ll be working in by looking at the database and the relationships between the data models. Pay attention to how normalized the data is. Normalized data limits data redundancy. For most standard use cases, we want normalized data. If you find that the foreign key or association path to any single model in your database follows many routes, you have denormalized data.

There are tools out there that will generate an ERD (entity-relationship diagram) to make it easy to see these relationships. This is another place where you can start providing immediate value to your stakeholder if they do not already have their data relationships documented.

I like dbdiagram.io which lets you upload a variety of schema files and auto-generates a diagram for you. dbdiagram.io requires that your relationships are modeled in the database layer. If you find that most of your relationships are only modeled at the application layer (I’ve seen this happen in many Rails projects), you may need a different tool such as rails-erd, which will generate a diagram based on the ActiveRecord associations present in your code.

This is an example of denormalized relationships. Notice how many direct and indirect associations there are for the same relationships.

This is an example of mostly the same tables after they've been normalized. There are many fewer associations duplicating the same relationships.

Pay attention to what data constraints and validations are present. A strong combination of normalization and constraints protect the data from anomalies during insertion, update, and deletion that can introduce bugs into your application.

A database that is very denormalized or lacks constraints such as foreign keys or not nullable constraints informs you to plan time for remediating these concerns before adding new features. If you cannot make changes in the database, increase your estimate to allow yourself more time to write extra robust validation and error handling code. You’ll need to proactively prevent dirty data that could cause hard to find application bugs.

Conclusion

Understanding the various components of a system can help you understand the big picture and start to form system-level and architectural-level recommendations for your stakeholders. In the next post, we'll leave the big picture behind and dig into reviewing the application code itself.

Existing application code review checklist

Does it run?
Language/framework version support
Infrastructure version support - database, storage service, cache, message broker, etc.
Configured environments
CI/CD pipeline
Data model
Static analysis
1. Code base size
2. Class and method size
3. Cyclomatic complexity
4. Churn
5. Test coverage
Human analysis
1. Security audit
2. Code conventions and idioms
3. Dead code
4. Dependencies
5. Documentation
Dynamic analysis
1. Performance profiling

Top comments (1)

Doug Tangren • Jul 6 '20

Thanks for this write up. Not nearly as much that could be said about the common case ofndeally with legacy is actually written about. Love this