DEV Community

Cover image for Plotting your Spotify data
Antonio Feregrino
Antonio Feregrino

Posted on • Edited on

2

Plotting your Spotify data

Inspired by Nerudista's post in the Tacos de Datos website (in spanish) I asked Spotify for my data and started making some plots with them. If you want to do the same, head over to this page; once you request it, it will take a couple of days to be available. In the meantime, you can use this Google Colab, I've created a subset of my data for you to play with.

Among all the information you will receive, there will be some files named following this pattern: StreamingHistoryXX.json, and these are the ones we'll use throughout this post.

The data

The files mentioned above files contain something like this:

[
  {
    "endTime" : "2019-02-04 17:14",
    "artistName" : "MGMT",
    "trackName" : "Time to Pretend",
    "msPlayed" : 261000
  },
  {
    "endTime" : "2019-02-04 17:18",
    "artistName" : "MGMT",
"..." 
Enter fullscreen mode Exit fullscreen mode

Where the values are:

  • endTime: Day and time in which a song finished, in UTC format.
  • artistName: Name of the artist of the song.
  • trackName: Name of the song.
  • msPlayed: For how long (in milliseconds) a song was played.

To load these this data into a DataFrame we'll need this tiny function:

from glob import glob
import json
import pandas as pd

def read_history():
    history = []
    for file in sorted(glob("StreamingHistory*.json")):
        with open(file) as readable:
            history.extend(json.load(readable))
    history = pd.DataFrame(history)
    history["endTime"] = pd.to_datetime(history["endTime"])
    return history

streaming_history = read_history()
streaming_history.head(5)
Enter fullscreen mode Exit fullscreen mode

Which should give us something like this once we execute it:

endTime artistName trackName msPlayed
0 2019-02-04 17:14:00 MGMT Time to Pretend 261000
1 2019-02-04 17:18:00 MGMT When You Die 263880
2 2019-02-04 17:23:00 Mr Dooves Super Smash Bros. Brawl Main Theme (From "Super Smash Bros. Brawl") [A Cappella] 201518

Histogram.

I've always been a fan of the way GitHub shows developer contributions, and the data we got from Spotify seems like the perfect candidate to create something similar; however, we need to perform some transformations first.

Since we are not interested in the time each song finished we'll get rid of the temporal part of endTime:

streaming_history["date"] = streaming_history["endTime"].dt.floor('d')
Enter fullscreen mode Exit fullscreen mode

Then we'll get a count of songs per day using groupby:

by_date = streaming_history.groupby("date")[["trackName"]].count()
by_date = by_date.sort_index()
Enter fullscreen mode Exit fullscreen mode

For out plot, we'll need to know which day of the week each date refers to, we can use the properties week and weekday:

by_date["weekday"] = by_date.index.weekday
by_date["week"] = by_date.index.week
Enter fullscreen mode Exit fullscreen mode

By the end, our dataframe should look like this:

trackName weekday week
date
2019-02-04 5 0 6
2019-02-05 8 1 6
2019-02-07 60 3 6

We have almost everything we need, the next step is to get a contiuous sequence of numbers for each week. In the dataframe above the 6th week of 2016 must be the week 0, the 7th must be the week 1... I can't think of a better way to do it than a for loop:

week = 0
prev_week = by_date.iloc[0]["week"]
continuous_week = np.zeros(len(by_date)).astype(int)
sunday_dates = []
for i, (_, row) in enumerate(by_date.iterrows()):
    if row["week"] != prev_week:
        week += 1
        prev_week = row["week"]
    continuous_week[i] = week
by_date["continuous_week"] = continuous_week 
by_date.head()
Enter fullscreen mode Exit fullscreen mode
trackName weekday week continuous_week
date
2019-02-04 5 0 6 0
2019-02-05 8 1 6 0
2019-02-07 60 3 6 0

Our next step is to generate a matrix with days ✕ weeks as dimensions, where each one of the entries will be the number of songs we listened to in that day and week:

songs = np.full((7, continuous_week.max()+1), np.nan)

for index, row in by_date.iterrows():
    songs[row["weekday"]][row["continuous_week"]] = row["trackName"]
Enter fullscreen mode Exit fullscreen mode

Now we could just plot the matrix songs using seaborn:

fig = plt.figure(figsize=(20,5))
ax = plt.subplot()
mask = np.isnan(songs)
sns.heatmap(songs, ax = ax)
Enter fullscreen mode Exit fullscreen mode

But the result is not that great:

Histograma horrible

If we want it to look better we still need some code, the first thing to do is to clean the axis labels:

min_date = streaming_history["endTime"].min()
first_monday = min_date - timedelta(min_date.weekday())
mons = [first_monday + timedelta(weeks=wk) for wk in range(continuous_week.max())]
x_labels = [calendar.month_abbr[mons[0].month]]
x_labels.extend([
    calendar.month_abbr[mons[i].month] if mons[i-1].month != mons[i].month else "" 
    for i in range(1, len(mons))])

y_labels = ["Mon", "", "Wed", "", "Fri", "", "Sun"]
Enter fullscreen mode Exit fullscreen mode

The X axis labels are far more complicated than the Y ones; this is because unlike Y, the X axis is not fixed nor continuous, as such, they need to calculated based on the data. If you want a more detailed explanation, tell me in the comments or at me on twitter at @feregri_no).

After doing this we'll perform some modifications with colours and the axis:

fig = plt.figure(figsize=(20,5))
ax = plt.subplot()

ax.set_title("My year on Spotify", fontsize=20,pad=40)
ax.xaxis.tick_top()
ax.tick_params(axis='both', which='both',length=0)
ax.set_facecolor("#ebedf0") 
fig.patch.set_facecolor('white')
Enter fullscreen mode Exit fullscreen mode

And finally, we can use seaborn's heatmap again, this time with a few extr arguments that I will explain later:

sns.heatmap(songs, linewidths=2, linecolor='white', square=True,
            mask=np.isnan(songs), cmap="Greens",
            vmin=0, vmax=100, cbar=False, ax=ax)

ax.set_yticklabels(y_labels, rotation=0)
ax.set_xticklabels(x_labels, ha="left")
Enter fullscreen mode Exit fullscreen mode

The arguments are as follows:

  • songs: our matrix with shape days ✕ weeks with the counts of songs per day,
  • linewidths: the size of the spacing between each patch,
  • linecolor: the colour of the spacing between each patch,
  • square: this tells the function that we want to keep the aspect ratio 1:1 for each patch,
  • mask: a very interesting argument, it will help us "mask" the patches for which there is no recorded value, this argument should be a boolean matrix of the same dimensions as the data being plotted, where each True means that that specific value must be masked,
  • cmap: the colormap to be used, luckily for us, the value "Greens" matches with the colour palette chosen by GitHub,
  • vmin: the value that should be considered as the minimum among our values,
  • vmax: the value that should be considered as the maximum among our values, I'd consider 100 to be the maximum, even though my record sits as 190 in a day!
  • cbar: a boolean value to indicate whether we want to show the colour bar that usually comes with the heatmap,
  • ax: the axes our plot should be plotted on.

And voilà, our plot is ready:

Histograma bonito

It is up to you to modify the plot, may be by adding information about the number of songs, or show the colorbar... a great idea would be to recreate this plot in a framework such as D3.js, but that may well belong to another post. Again, feel free to head over to this Colab and contact me via twitter @io_exception.

Billboard image

Synthetic monitoring. Built for developers.

Join Vercel, Render, and thousands of other teams that trust Checkly to streamline monitor creation and configuration with Monitoring as Code.

Start Monitoring

Top comments (0)

Image of AssemblyAI

Automatic Speech Recognition with AssemblyAI

Experience near-human accuracy, low-latency performance, and advanced Speech AI capabilities with AssemblyAI's Speech-to-Text API. Sign up today and get $50 in API credit. No credit card required.

Try the API

👋 Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay