Discussion on: How do you get familiar with a new codebase?

View post

In my career, I have found that most times I am working with a legacy codebase that has some combination of messy code, lack of tests, and no documentation. My main approach for figuring out codebases like this is as follows:

Learning legacy codebases

The four Rs of learning legacy codebases are Read, Refactor, Write Tests and Repeat. (Don't let facts get in the way of a good naming scheme.

1: Read

Read through a specific part of the codebase. Try to grok in your head as much as you can without touching anything.

2: Refactor

For long parts of the codebase that don't make sense, break it up into methods until it does make sense. Is there a red-herring variable that does nothing? Remove it. Refactoring to make it easier to read helps you be able to keep everything in your head at once.

3: Write tests

Write some tests for your newly refactored code. Is it actually doing what you thought it was doing? Test your hypothesis. If your hypothesis is right, you now have tests for the next person (or, more importantly, your future self.) If your hypothesis is wrong, use the fifth of the four Rs of learning legacy codebases, Revert! (don't even worry about it)

Reverting

You broke everything, good job. Now you know one more thing this code does not do. Revert the code you wrote and start again. Do not focus on what the code should do or should look like. This will only lead to anger or shame on your part. Instead, accept the code as it is. This is the way to code zen.

4: Repeat

Now repeat the steps until you know enough about the codebase to do what you need to.