I love analytics and I love git - so I built a git commit analyzer 🙌
The web app works with your local git repository and also integrates with GitHub.
Try it out here or check out the repository on GitHub.
Overview
The Git Commit Analyzer reads your git log, parses it into its components and then allows you to explore it with a number of neat visualizations.
You can also choose to import a project directly from GitHub.
Implementation
If you're interested in how I implemented the web app, hopefully this section gives some insight.
From the landing page of the web app, the user can either choose to upload a git log
from a local repository or continue with the GitHub integration. In each case, the data is fetched, validated, and the user can proceed to the dashboard. On the dashboard, the user can create custom visualizations for the repository.
Technologies
The web app is implemented using Vue.js and JavaScript. I'm using Bulma as a pure CSS framework. The web app is deployed on the free tier of Netlify. I used Jest and Vue Test Utils for unit testing the implementation. The visualizations of the dashboard are implemented with Vue Plotly.
Setting up an OAuth workflow can be somewhat tedious. Luckily, Pizzly offers an amazingly simple way to take care of this. Pizzly provides a proxy server for over 80 OAuth integrations and you can deploy your own instance for free on Heroku (as I did).
To summarize:
- JavaScript
- Vue.js as a web framework
- Vue Test Utils for testing
- Vue Plotly for visualizations
- Bulma as a CSS framework
- Pizzly to handle GitHub OAuth
- Netlify for deployment
Git log parsing
The raw git log file is split into individual commits using regular expressions and converted into Commit
objects, which look something like this:
class Commit:
hash: String
authorName: String
authorMail: String
timestamp: Date
message: String
nodes: Array[Node]
class Node:
path: String
additions: Integer
deletions: Integer
If the git log is imported from GitHub, the JSON history is converted into the Commit
objects in a similar way.
GitHub integration
Importing the commit history from GitHub is generally pretty straightforward with the official GitHub API. However, setting up a full OAuth workflow in order to authenticate yourself to the GitHub API can make the process more laborious.
During the development, I stumbled over Pizzly, an open source service that acts as a proxy service to over 80 commonly used APIs, among them GitHub. You can deploy your own Pizzly instance to Heroku for free and use it to manage all your API calls.
The OAuth workflow reduces to a few lines of code:
import Pizzly from "pizzly-js";
// get environment variables
const HOST = process.env.VUE_APP_PIZZLY_HOST
const SECRET = process.env.VUE_APP_PIZZLY_SECRET_KEY
// create pizzly instance and integration instance
const pizzly = new Pizzly({host: HOST, publishableKey: SECRET})
const githubApi = pizzly.integration('github')
/**
* Perform the OAuth workflow using the GitHub API.
* @return authId
**/
const authenticate = function() {
return githubApi.connect()
}
Call the authenticate
function, and Pizzly will take care of your authentication.
To give an example, you can get the names of the repositories from a certain page of the API, you can call the following function. You also need to pass the authId
, returned during the authentication workflow.
/**
* Get the names of the repositories of a given page of the GitHub API.
* @param page (Number) page to get
* @param perPage (Number) entries per page
* @param authId (String) authentication ID from the auth workflow
* @return (Array) repository names
**/
const getRepoPage = function(page, perPage, authId) {
return githubApi
.auth(authId)
.get('/user/repos', {
headers: {"Content-Type": "application/vnd.github.v3+json"},
query: {"page": page, "per_page": perPage, "visibility": "all"}
})
.then(res => res.json())
.then(jsn => jsn.map(e => e.name))
}
Pretty neat, don't you think?
Data wrangling
When building the dashboard, I wanted to give the user as much freedom as possible to pick and chose which metrics to visualize. In terms of a simple 2D plot, this means the user should be able to choose which variable lives on each axis as well as if the data should be grouped by a third variable.
Implementing this was lots of fun! Using the parsed git log containing the Commit
objects (as described above), I defined a number of functions that can be applied to an array of commits.
These functions fall into two categories: key and value functions.
Key functions take a Commit
object and extract a certain key value (e.g. the hash, date, author etc.). Value functions take an array of Commit
objects and summarize them by a single value (e.g. number of commits, additions, deletions).
With this setup, we can take an array of Commit
objects, and aggregate it by a certain key function using a value function. For example, we could get the number of commits (value) per author (key).
Consider the following LogHandler
class, which defines aggregateBy
and groupBy
as well as the value function vfNumCommits
and the key function kfAuthorName
.
class LogHandler {
constructor(gitlog) {
this.data = [...gitlog.log]
}
// Key function for name of author
static kfAuthorName(obj) {
return obj.authorName
}
// Value function for number of commits
static vfNumCommits(array) {
return array.length
}
/**
* Group by a key function.
* @param keyFunc: function to get the key per commit
* */
groupBy(keyFunc) {
return this.data.reduce((agg, next) => {
const curKeyValue = keyFunc(next)
curKeyValue in agg ? agg[curKeyValue].push(next) : agg[curKeyValue] = [next]
return agg
}, {})
}
/**
* Aggregator for top level keys of the Gitlog object.
* @param keyFunc: function to get the key per commit
* @param valueFunc: function to aggregate by
* */
aggregateBy(keyFunc, valueFunc) {
const grouped = this.groupBy(keyFunc)
Object.keys(grouped).forEach((k) => {
grouped[k] = {
value: valueFunc(grouped[k]),
}
})
return grouped
}
}
If we instantiate LogHandler
with our git log, we can call aggregateBy(LogHandler.kfAuthorName, LogHandler.vfNumCommits)
we would get an object containing the numbers of commits per author, like this:
{
"Alice" : {"value" : 42},
"Bob" : {"value" : 13}
}
Now what if we wanted to further group these results by year, i.e. number of commits for each author for each year.
We can define another method in the LogHandler
class, called groupAggregateBy
and a key function for the year kfYear
.
static kfYear(obj) {
return obj.timestamp.getFullYear()
}
groupAggregateBy(groupFunc, keyFunc, valueFunc) {
const grouped = this.data.reduce((agg, next) => {
const curKey = [keyFunc(next), groupFunc(next)]
curKey in agg ? agg[curKey].push(next) : agg[curKey] = [next]
return agg
}, {})
Object.keys(grouped).forEach((k) => {
grouped[k] = {
key: keyFunc(grouped[k][0]),
group: groupFunc(grouped[k][0]),
value: valueFunc(grouped[k])
}
})
return grouped
}
The groupAggregateBy
takes an additional argument, groupFunc
which can be any key function. Each key created by applying the key function to the array of Commit
objects is one group.
Continuing with our example, we would call groupAggregateBy(LogHandler.kfYear, LogHandler.kfAuthorName, LogHandler.vfNumCommits)
, which would result in the following object:
{
"[2022,Alice]" : {"key": "Alice", "group": 2022, "value": 2}
"[2021,Alice]" : {"key": "Alice", "group": 2021, "value": 30}
"[2020,Alice]" : {"key": "Alice", "group": 2020, "value": 10}
"[2022,Bob]" : {"key": "Bob", "group": 2022, "value": 10}
"[2019,Bob]" : {"key": "Bob", "group": 2019, "value": 3}
}
Now, we simply need to implement a key and a value function for any key and value we want the user to have access to.
On the dashboard, the user can then select any of the defined functions, which are then applied to the git log resulting in the transformed data set being used as input to the visualization.
Conclusion and improvements
I had a lot of fun implementing the git commit analyzer and I love the insight I get from it.
There are a number of issues that can still be improved:
- Parsing file extensions: this would be a great enhancement to add information about languages used in the repo
- Branch info: right now, branch information is ignored by the tool
- Session persistence: right now, visualizations are lost during page refreshes
- General UX improvements: I've noticed that users who visit the dashboard for the first time don't intuitively realize all the functions
Nonetheless, I hope the tool is fun to use and you can find new insights into your commit history!
Please feel free to reach out with feedback, comments, or ideas for improvements!
Top comments (0)