DEV Community

lbonanomi
lbonanomi

Posted on • Updated on

Tallying Your Github Repo Views

You: are a data scientist harvesting Github visitor traffic stats for analysis.
I: am thirsty and insecure.
We are both interested in tallying Github's unique-visitor logs for all of our repos.

Collecting traffic data

Github offers a REST endpoint to query referrers, most-popular files, views (total or unique) and clone traffic, and most well-appointed Linux hosts offer the curl and jq utilities for polling and parsing the returned JSON.

#!/bin/bash

BUFF=$(mktemp)

# Query the Github `user` endpoint to get a list of the user's repos. 
# This is neater than hard-coding a username or URL into this script.
#
MYREPOS=$(curl -sn https://api.github.com/user | jq .repos_url | tr -d '"')

# Query $MYREPOS for a list of repository URLs and mutate them into 
# `https://api.github.com/$user/$repo/traffic/views` 
#
curl -sn $MYREPOS | jq .[].url | tr -d '"' | while read URL
do
    curl -sn "$URL/traffic/views" | jq -c .views[] | while read view;
    do
        # Query the repo's `views` URL and parse returned JSON 
        # for unique-visitor counts by date

        VIEWDATE=$(echo "$view" | jq .timestamp | awk -F"T" '{ print $1 }')
        VIEWERS=$(echo "$view" | jq .uniques)

        echo "$VIEWDATE, $VIEWERS, $URL" | tr -d '"'
    done
done > $BUFF

# Now we should have a temp file that looks like:
#
# 2019-05-11, 1, https://api.github.com/repos/lbonanomi/go
# 2019-05-12, 1, https://api.github.com/repos/lbonanomi/go
# 2019-05-13, 1, https://api.github.com/repos/lbonanomi/go
#

# Let's leverage `seq` and GNU `date` to make a list of dates for the 
# last 10 days and build a CSV

seq 0 9 | tac | while read SINCE
do
    # datestamp
    date -d "$SINCE days ago" +%Y-%m-%d | tr "\n" "\t"

    # If there's no-data report a '0'
    grep -q $(date -d "$SINCE days ago" +%Y-%m-%d) $BUFF || printf "0"

    # Normally grep-before-awk makes me psychotic, but I think this is clearer.
    grep $(date -d "$SINCE days ago" +%Y-%m-%d) $BUFF|awk -F"," '{t=t+$2} END{print t}'
done

# Give a hoot and delete temp files when they're done.
rm $BUFF

Running our script should give us a tsv-formatted count of unique visitors to our personal Github presence in the last 10 days:

2019-05-16  5
2019-05-17  9
2019-05-18  1
2019-05-19  2
2019-05-20  0
2019-05-21  2
2019-05-22  2
2019-05-23  8
2019-05-24  3
2019-05-25  4

Wouldn't this look handsome with spark graph or a line graph?

Top comments (0)