Code Review: Best Practices

#java #csharp #codequality #codereview

The importance of Code Review in Software Development cannot be understated. A properly done Code Review will not only improve the quality of the code & identify potential issues at a very early stage, but also helps in shaping the Developer's skills. However, most of the time, Code Review mean, just a cursory glance at the changes and provide one or two comments. Not many do a detailed and through reviews. If you are looking for how to do a better job of reviewing the code, here are some guidelines that might help you. Before getting technical, let's discuss few non-technical stuff:

First point to understated is that Code Reviews can potentially become as a collision point between the Reviewer and the Developer. The Developers (especially during the formative years of the career) tend to be possessive about the code they write. However crappy their code is, they think that is the best code ever written. In cases like these, the Reviewer has to provide a little sugar coating in the comments. Also sitting next to the developer during the code review helps than doing a remote/disconnected code review.

Next, Code Review is NOT reviewing formatting or any static analysis manually. Each person will have a personal preference on items like Tab/spaces or brackets in the same line/next line, etc. Its best to settle it in some way (dictatorship of the senior guy mostly works, but yeah, fist fight is a solution too), and every one else to follow it. Best to integrate this in the Code Formatting preferences into your the favourite IDE, so that it formats as you type/save. Similarly you can do all the Static Code Analysis (such as potential null pointer exceptions or use of deprecated methods) via various tools and integrate the same into your IDE and build process.

Lastly, do not provide subjective review comments. It only leads to collisions. When you provide any review comment, if the comment/requested change is not obvious, provide a solid reasoning on the change is required. This helps a long way in the relationship between Reviewer and Developer, and also helps the Developer to understand the things he is not aware of.

Now lets see the goal of Code Reviews.

Make sure the change does addresses the functional requirement
If possible, identify any potential issues (off-by one errors, potential performance issue etc)
Is readable and maintainable

The following guidelines should be followed and verified in the Code Review:

Naming:

"A rose by any other name would smell as sweet". Well, lets give it to Shakespeare, that is true. The same logic cannot be applied to naming classes/methods/variables in programming. Say a method that deletes an entity from the database, is better named as Entity.delete(). Any other name like Entity.hide() or Entity.remove() is not the same correct and/or effective as delete().

One real life example. Years ago, I came across a method named Tree.seachForNode(name). It was implemented as

do a traversal of the nodes in the tree and if a node with the name is found, return the same

if not found, then create a node with that name; update the tree to have the node; insert the node into database and return the node

Imagine the confusion of the callers of that method. No one would have expected that a database insert will be result of a 'search' method. That is not the case, if the method was named as createIfNotExists().

So intention-revealing-names are a very important to understand and maintain the code base.

Falls in line with the architecture:

Software Architecture is about the design choices. What framework to use; what are the individual modules; how they interact with themselves; what are the libraries to use; what are the cross-cutting concerns & how to implemented.

Single Responsibility:

Every method and every class should do only one thing, and do it well. During a review, we should verify this really happens. In practical cases, where the single responsibility is not possible, we should ensure that the method is at least broken into smaller methods, each with one responsibility. In the above example, createIfNotExists() has two responsibilities (1) Search for a node, (2) Create a node. It would better if we can break the functionality into two different methods Tree.search() & Tree.create(), and the Tree.createIfNotExists() is simply a wrapper that calls both the methods.

Abstraction:

Each layer/class/method should have a proper abstraction.

In a layered architecture, each layer should effectively hide the complexities of the underlying layer, and provide an abstraction for the layers above. For example, a Data Access Layer will effectively handle the complexities of the Data Store (foreign keys, transactions, etc) and provide a logical model to work on. The Service Layer would take up the logical model and expose the functionalities. The UI/API layer will use these services and provide the UI or API. Now each layer has an abstraction. The Business layer should not work on the foreign keys or handling the HTTP Request/Response.

Likewise, if a class called BinaryTree is trying to parse a configuration file to read some configuration, that is also a leaky abstraction. Parsing and setting up the configuration and environment variables is best done by another class, while BinaryTree's abstraction should handle only the operations of a implementing a binary tree.

Provide correct level of Abstraction:

In Object Oriented Design, an abstraction provides you an easier way to interact with, hiding all its internals. Do not over expose more than what should be known to the clients. Examples:

Should that method/variable be really public? Or may be protected will do?
In some cases, a property should only be changeable within the class, while the clients should have read only access to the property. In that case, having a public getter method is required, but why a public setter method?
Does the getChildren() method returns List? What if the client modifies the list? Does the Tree class automatically adapts? May be we should be returning an Enumerable, so that the client can iterate and access the children, but can't modify the collection?

Provide right level of abstractions, so that there won't be any unexpected behaviours.

Method contract:

Every method will have a contract. The contract consists of Pre-Condition, Execution/Behaviour and a Post-Condition. For example, lets consider insert(TreeNode) method, that inserts a new record into a database. The potential Pre-Conditions are (1) TreeNode should not be null and (2) the name field should not be null. The Execution/Behaviour is inserting the row into database and Post-Condition is that a record in the database should exist with the given values.

Pre-Conditions must be validation before beginning of the Execution. Also the assumption is that the onus of satisfying the Pre-Conditions is on the callers of this method. Any failure in Pre-Condition, the method should result in an Exception. If everything goes right with the Execution, the method should ensure that the Post-Condition is met. If for any reason the Post-Condition cannot be met, again, should result in Exception.

A typical implementation for the insert method could be:

void insert(TreeNode node){  

//Pre-Condition 1  
if(node == null)  
   throw new IllegalArgumentException("Parameter 'node' cannot be null");  

//Pre-Condition 2  
if(node.getName() == null)  
   throw new IllegalArgumentException("'Name' of the node cannot be null");  


//Execution  
try(get the Database Connection){  
   // insert into record into the database  
}catch(DuplicatePrimaryKeyException e){  
   throw new InvalidDataException("A node with the name '"+node.getName()+"' already exists");  
}  
}

If you have noticed, while we validate the Pre-Conditions, valiation of the Post-Condition (whether the record exists in the database with the given values) is not done in here. This is typically verified in the Unit Tests for this method.

Choosing the right Data Type:

Choosing the right type for the given data is essential for reducing the bugs. Many times, I've come across strings used in place of enums. If the list of values are finite and known at compile time, why use string? Example: Gender.

More than enums, List is another widely and wrongly used type. Consider this:

List excludedExtensions = ReadFromConfig();

List is used to stored an ordered collection of values. Values can be duplicated too. In this cases, the file extentions to be excluded will be unique and the ordering is not important. They why use of List? Here Set is a better fit data type than a List.

Quick note on working with legacy codebases:

If you are working on a legacy code base, most probably it is not the cleanest code you have seen (if you want it without the sugar coating: its a total mess). The only way to clean it up is to do it bit by bit. Every commit you do should leave the code base cleaner than it was before. If that is not possible, at least ensure that it is not made even more dirtier. It takes a lot of patience and time to slowly fix it.