DEV Community

Cover image for When You Merge Pull Requests You Lose Knowledge
Marcelo Sousa for Reviewpad

Posted on • Originally published at reviewpad.com

When You Merge Pull Requests You Lose Knowledge

What happens to comments and discussions of a pull request when it is merged? Isn’t that knowledge valuable for the team? Why isn’t it readily accessible to future pull requests?

Pull requests (PRs) are a mainstream way of proposing changes to a codebase. One of the most important features around pull requests is the ability to perform a code review, allowing developers to inspect and comment on the proposed changes.

Frequently, the knowledge in these comments is invaluable to the team. Reviewers, which are familiar with the codebase and related technologies, will notice bad patterns, suggest optimisations, enforce team guidelines and best practices, etc.

However, as soon as you merge that pull request on GitHub, Gitlab, or Bitbucket, all this information is lost into the bag of closed PRs that you will rarely (if ever) inspect.

We all have had that déjà vu feeling when writing a comment in a code review: “Didn’t I write this comment already?” When preparing a PR sometimes we also have that feeling: “I think I’m forgetting something in this line of code.” Or better yet: “Didn’t someone already thought of this before?”

Let’s look at three scenarios.

1. Knowledge useful to a broader audience

Knowledge that could be beneficial to developers not involved in this codebase. For instance, this comment could be useful for other developers modifying similar code.

Discussions useful in the future

Discussion by kgabryje at apache / superset “feat(native-filters): add search all filter options #14710

2. Knowledge about design decisions

Teams with a lot of contributors can suffer from problems associated with the lack of context in design decisions. For instance, when a decision was taken on a discussion on a previous pull request. Consider the following discussion:

Useful knowledge about design decisions in discussions

Discussion with halspang and artursouza at dapr / dapr “Write reminders in multiple partitions #3297

Assuming that this PR is merged, when someone else modifies this code in the future, it would be useful to retrieve this discussion. In this case, that would be particularly relevant if this person starts the support for a decrementing path.

3. Review déjà vu

Senior developers that are active reviewers also experience what we call review déjà vus. These are situations where you are repeating similar comments in different PRs, like this one:

Review dejá vù in discussions

Comment by timotheecour at nim-lang / Nim “oids: switch from PRNG to random module #16203

These situations also occur when there are some hidden assumptions that do not make sense to document in the code.

Wouldn’t it be nice if we could re-use the comments from past reviews into future PRs without having to maintain them?

Enter Reviewpad

Reviewpad builds semantic diffs of changes in pull requests and associates metadata such as comments to these semantic objects. This is a big shift from the current behavior of existing code hosts such as Github, Gitlab, and Bitbucket. They associate comments to a line of a file in a particular git diff – as soon as the contents of the line change, this comment is marked as outdated.

We’ll showcase Reviewpad with the Web Crawler exercise of A Tour of Go using the linked GitHub repository. In this exercise of A Tour of Go, we are given a first implementation of a fake web crawler – the exercise is to parallelise it and prevent fetching the same URL twice.

Semantic git diffs

As a first step, we used Reviewpad to create a pull request with this first implementation.

Reviewpad screenshot

Reviewpad interface for pull request after review and merge

Reviewpad performs a relational static analysis (in the style of [1] and [2]) over the git diff associated with the pull request to build semantic diffs. These semantic diffs form what we call the Explore Tree in Reviewpad.

Reviewpad symbol tree

Explore tree for pull request

An explore tree is an annotated file tree of the modified files. For each modified file, we present an annotated semantic tree of the changes. For example, the file fakeFetcher.go as presented on GitHub requires an entire scan to understand the main symbols added: 1) the type fakeFetcher, 2) the struct fakeResult with two fields body and urls and 3) the method Fetch. Note that the method Fetch is shown as a child of the type fakeFetcher as it is the receiver type. At the moment, our analysers focus on types and methods – this is why we don’t show variable fetcher. The color in the beginning of the symbol indicates the type of modification: green for added, red for deleted and yellow for modified.

We show the explore tree on the left of the description and general conversation so that developers can quickly match the description and discussions to code changes.

Each element of the explore tree is clickable to its representation in the git diff. For example, clicking on the fakeFetcher type directs the user to the beginning of the symbol in the diff:

Reviewpad screenshot

Navigation from the explore tree to the code diff

Note: For those you noticed the coloured icon next to the file go.mod – this means that the file is automatically hidden from the review mode. Each Reviewpad user can tweak their file view in their review settings to exclude files.

In this PR, I did a first review to document parts of the code. For example, while the original code for Crawl is:

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
    // TODO: Fetch URLs in parallel.
    // TODO: Don't fetch the same URL twice.
    // This implementation doesn't do either:
...
}
Enter fullscreen mode Exit fullscreen mode

I chose to comment on the function as a code diff comment. These review comments are inserted into the knowledge base maintained by Reviewpad:

Reviewpad screenshot

Diff comment in Crawl function

Note that the comment in the explore tree appears as a child of Crawl symbol. This means that the comment was made in the definition of the symbol (not just in its signature).

Comments from past reviews

After the PR with the initial code was squash and merged, I created a new PR with a solution presented by the GitHub user harryhare.

Reviewpad screenshot

Reviewpad interface for pull request with first solution

The explore tree again shows the overview of the solution which involve the introduction of a SafeCounter struct with a map v to check if an URL was visited, a mutex mux and a method checkvisited that checks if URL was already visited or not.

The rest of the solution involves changes to the functions Crawl and main to fetch URLs in parallel. In the Crawl method, Reviewpad presents the comment from the past PR that documents the method. This way, developers can retrieve discussions from previous PRs that changed this method.

Reviewpad screenshot

Code diff view with comment from previous pull request

A user review raised an issue with the checkvisited which is fixed in the commit [fix from review](https://github.com/reviewpad/dejavus/pull/3/commits/a629a38fddad2305a13f2ce07da6a001c263920b).

A final review approves the PR with the request to remove the TODOs in the code.

In the final commit of this PR, we add the reference to the solution as a code comment in the Crawl method and remove the TODOs.

Reviewpad screenshot

Reviewpad interface for all changes in pull request

Hovering on the function Crawl in the explore tree displays the code documentation associated with this function for readability.

Keeping track of comments and changes over time

At this point you might be wondering:

  • What happens if the comment was made in a PR that was not the exact previous one?
  • What happens if I start refactoring the code?

To answer these questions, let’s go over the two open PRs in our project.

The first is an improvement proposed by another Github user that uses wait groups.

Reviewpad screenshot

Pull request proposing an improvement using wait groups

The comment shown in this pull request was the one made in the first PR. Reviewpad tracks comments regardless of their time difference. For example in the following PR from the google/error-prone project:

Reviewpad screenshot

Reviewpad review mode for pull request of the google/error-prone project. You can check it on Github here.

The PR was opened in September 2020 and it is changing class methods that were commented in January 2020.

The second open pull request demonstrates a beta version of our refactoring analyser:

Reviewpad screenshot

Pull request with a refactoring

In this PR, we introduced two commits, the first that changes the signature of the function Crawl by renaming the variable url to reqUrl and the second renames the file.

Reviewpad understands that it is the same function and presents the comment made in the first PR. Note that we are squash and merging these pull requests – the commit associated with the comment is not an ancestor of the commits in this PR.

Cool. How do I check this out by myself?

We have a public beta version of Reviewpad available at https://reviewpad.com/get-started/. You will need to create a new account and once you log in for the first time, you will see the following page to connect to a code host:

Reviewpad Screenshot

Connect to code host page on Reviewpad

You can connect to GitHub through our OAuth app or manually add a personal access token. The OAuth requires minimal scopes to be able to read and comment on public repositories.

And that’s it – you will be able to give Reviewpad a spin on public repositories! We will be adding more and more public repositories in the upcoming days – feel free to reach us on our community Slack with requests!

References

[1]: Cartesian hoare logic for verifying k-safety properties.

[2]: Verified three-way program merge.

Discussion (0)