DEV Community

Cover image for Insights into your git commits: Git Commit Analyzer
Leo Pfeiffer
Leo Pfeiffer

Posted on

Insights into your git commits: Git Commit Analyzer

I love analytics and I love git - so I built a git commit analyzer 🙌

The web app works with your local git repository and also integrates with GitHub.

Try it out here or check out the repository on GitHub.

Line chart with groups

Overview

The Git Commit Analyzer reads your git log, parses it into its components and then allows you to explore it with a number of neat visualizations.

You can also choose to import a project directly from GitHub.

Implementation

If you're interested in how I implemented the web app, hopefully this section gives some insight.

From the landing page of the web app, the user can either choose to upload a git log from a local repository or continue with the GitHub integration. In each case, the data is fetched, validated, and the user can proceed to the dashboard. On the dashboard, the user can create custom visualizations for the repository.

Git Commit Analyzer Workflow

Technologies

The web app is implemented using Vue.js and JavaScript. I'm using Bulma as a pure CSS framework. The web app is deployed on the free tier of Netlify. I used Jest and Vue Test Utils for unit testing the implementation. The visualizations of the dashboard are implemented with Vue Plotly.

Setting up an OAuth workflow can be somewhat tedious. Luckily, Pizzly offers an amazingly simple way to take care of this. Pizzly provides a proxy server for over 80 OAuth integrations and you can deploy your own instance for free on Heroku (as I did).

To summarize:

Git log parsing

The raw git log file is split into individual commits using regular expressions and converted into Commit objects, which look something like this:

class Commit:
  hash: String
  authorName: String
  authorMail: String
  timestamp: Date
  message: String
  nodes: Array[Node]

class Node:
  path: String
  additions: Integer
  deletions: Integer
Enter fullscreen mode Exit fullscreen mode

If the git log is imported from GitHub, the JSON history is converted into the Commit objects in a similar way.

GitHub integration

Importing the commit history from GitHub is generally pretty straightforward with the official GitHub API. However, setting up a full OAuth workflow in order to authenticate yourself to the GitHub API can make the process more laborious.

During the development, I stumbled over Pizzly, an open source service that acts as a proxy service to over 80 commonly used APIs, among them GitHub. You can deploy your own Pizzly instance to Heroku for free and use it to manage all your API calls.

The OAuth workflow reduces to a few lines of code:

import Pizzly from "pizzly-js";

// get environment variables
const HOST = process.env.VUE_APP_PIZZLY_HOST
const SECRET = process.env.VUE_APP_PIZZLY_SECRET_KEY

// create pizzly instance and integration instance
const pizzly = new Pizzly({host: HOST, publishableKey: SECRET})
const githubApi = pizzly.integration('github')

/**
* Perform the OAuth workflow using the GitHub API.
* @return authId
**/
const authenticate = function() {
    return githubApi.connect()
}
Enter fullscreen mode Exit fullscreen mode

Call the authenticate function, and Pizzly will take care of your authentication.

To give an example, you can get the names of the repositories from a certain page of the API, you can call the following function. You also need to pass the authId, returned during the authentication workflow.

/**
* Get the names of the repositories of a given page of the GitHub API.
* @param page (Number) page to get
* @param perPage (Number) entries per page
* @param authId (String) authentication ID from the auth workflow
* @return (Array) repository names 
**/
const getRepoPage = function(page, perPage, authId) {
    return githubApi
        .auth(authId)
        .get('/user/repos', {
            headers: {"Content-Type": "application/vnd.github.v3+json"},
            query: {"page": page, "per_page": perPage, "visibility": "all"}
        })
        .then(res => res.json())
        .then(jsn => jsn.map(e => e.name))
}
Enter fullscreen mode Exit fullscreen mode

Pretty neat, don't you think?

Data wrangling

When building the dashboard, I wanted to give the user as much freedom as possible to pick and chose which metrics to visualize. In terms of a simple 2D plot, this means the user should be able to choose which variable lives on each axis as well as if the data should be grouped by a third variable.

Implementing this was lots of fun! Using the parsed git log containing the Commit objects (as described above), I defined a number of functions that can be applied to an array of commits.

These functions fall into two categories: key and value functions.

Key functions take a Commit object and extract a certain key value (e.g. the hash, date, author etc.). Value functions take an array of Commit objects and summarize them by a single value (e.g. number of commits, additions, deletions).

With this setup, we can take an array of Commit objects, and aggregate it by a certain key function using a value function. For example, we could get the number of commits (value) per author (key).

Consider the following LogHandler class, which defines aggregateBy and groupBy as well as the value function vfNumCommits and the key function kfAuthorName.

class LogHandler {
    constructor(gitlog) {
        this.data = [...gitlog.log]
    }

    // Key function for name of author
    static kfAuthorName(obj) {
        return obj.authorName
    }

    // Value function for number of commits
    static vfNumCommits(array) {
        return array.length
    }

    /**
     * Group by a key function.
     * @param keyFunc: function to get the key per commit
     * */
    groupBy(keyFunc) {
        return this.data.reduce((agg, next) => {
            const curKeyValue = keyFunc(next)
            curKeyValue in agg ? agg[curKeyValue].push(next) : agg[curKeyValue] = [next]
            return agg
        }, {})
    }

    /**
     * Aggregator for top level keys of the Gitlog object.
     * @param keyFunc: function to get the key per commit
     * @param valueFunc: function to aggregate by
     * */
    aggregateBy(keyFunc, valueFunc) {
        const grouped = this.groupBy(keyFunc)
        Object.keys(grouped).forEach((k) => {
            grouped[k] = {
                value: valueFunc(grouped[k]),
            }
        })
        return grouped
    }
}
Enter fullscreen mode Exit fullscreen mode

If we instantiate LogHandler with our git log, we can call aggregateBy(LogHandler.kfAuthorName, LogHandler.vfNumCommits) we would get an object containing the numbers of commits per author, like this:

{
  "Alice" : {"value" : 42},
  "Bob" : {"value" : 13}
}
Enter fullscreen mode Exit fullscreen mode

Now what if we wanted to further group these results by year, i.e. number of commits for each author for each year.

We can define another method in the LogHandler class, called groupAggregateBy and a key function for the year kfYear.

static kfYear(obj) {
    return obj.timestamp.getFullYear()
}

groupAggregateBy(groupFunc, keyFunc, valueFunc) {
    const grouped = this.data.reduce((agg, next) => {
        const curKey = [keyFunc(next), groupFunc(next)]
        curKey in agg ? agg[curKey].push(next) : agg[curKey] = [next]
        return agg
    }, {})
    Object.keys(grouped).forEach((k) => {
        grouped[k] = {
            key: keyFunc(grouped[k][0]),
            group: groupFunc(grouped[k][0]),
            value: valueFunc(grouped[k])
        }
    })
    return grouped
}
Enter fullscreen mode Exit fullscreen mode

The groupAggregateBy takes an additional argument, groupFunc which can be any key function. Each key created by applying the key function to the array of Commit objects is one group.

Continuing with our example, we would call groupAggregateBy(LogHandler.kfYear, LogHandler.kfAuthorName, LogHandler.vfNumCommits), which would result in the following object:

{
  "[2022,Alice]" : {"key": "Alice", "group": 2022, "value": 2}
  "[2021,Alice]" : {"key": "Alice", "group": 2021, "value": 30}
  "[2020,Alice]" : {"key": "Alice", "group": 2020, "value": 10}
  "[2022,Bob]" : {"key": "Bob", "group": 2022, "value": 10}
  "[2019,Bob]" : {"key": "Bob", "group": 2019, "value": 3}
}
Enter fullscreen mode Exit fullscreen mode

Now, we simply need to implement a key and a value function for any key and value we want the user to have access to.

On the dashboard, the user can then select any of the defined functions, which are then applied to the git log resulting in the transformed data set being used as input to the visualization.

Conclusion and improvements

I had a lot of fun implementing the git commit analyzer and I love the insight I get from it.

There are a number of issues that can still be improved:

  • Parsing file extensions: this would be a great enhancement to add information about languages used in the repo
  • Branch info: right now, branch information is ignored by the tool
  • Session persistence: right now, visualizations are lost during page refreshes
  • General UX improvements: I've noticed that users who visit the dashboard for the first time don't intuitively realize all the functions

Nonetheless, I hope the tool is fun to use and you can find new insights into your commit history!

Please feel free to reach out with feedback, comments, or ideas for improvements!

Screenshots

Landing page

Simple line chart

Line chart with groups

Sunburst chart

Bar chart

Bar chart with groups

Pie chart

Oldest comments (0)