<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bauke Brenninkmeijer</title>
    <description>The latest articles on DEV Community by Bauke Brenninkmeijer (@baukebrenninkmeijer).</description>
    <link>https://dev.to/baukebrenninkmeijer</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F496158%2F9349c487-b38b-4743-9cbc-82b4befe1e66.jpeg</url>
      <title>DEV Community: Bauke Brenninkmeijer</title>
      <link>https://dev.to/baukebrenninkmeijer</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/baukebrenninkmeijer"/>
    <language>en</language>
    <item>
      <title>Analyzing my Spotify listening history 🎵 - Part 2</title>
      <dc:creator>Bauke Brenninkmeijer</dc:creator>
      <pubDate>Thu, 22 Oct 2020 20:55:58 +0000</pubDate>
      <link>https://dev.to/baukebrenninkmeijer/analyzing-my-spotify-listening-history-part-2-2c5i</link>
      <guid>https://dev.to/baukebrenninkmeijer/analyzing-my-spotify-listening-history-part-2-2c5i</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally written on my personal website, where the charts and plots are interactive and I show the code to plot. If you are missing that, check out &lt;a href="https://www.baukebrenninkmeijer.nl/posts/spotify-listening-history-analysis-part-2.html" rel="noopener noreferrer"&gt;the original&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Short recap: Part 1
&lt;/h1&gt;

&lt;p&gt;In &lt;a href="https://dev.to/baukebrenninkmeijer/analyzing-my-spotify-listening-history-part-1-3jl4"&gt;part 1&lt;/a&gt; of this series we looked at the first part of this project. This included:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The data we are working with and what it looks like. &lt;/li&gt;
&lt;li&gt;The amount of listening done per year and per month.&lt;/li&gt;
&lt;li&gt;The amount of listening done per hour of day, also throughout the years. &lt;/li&gt;
&lt;li&gt;The amount of genres we have per song/artist. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We will continue from where we left of, diving deeper into the genres. &lt;/p&gt;

&lt;p&gt;We'll load up the original JSON from Spotify, as well as the genres we created in part 1. We then combine them into comb, the combined dataframe. In genres.csv, we again see the 20 columns with the genres for each song, where the genres are collected from the artist, since songs are not labeled as having a genre. For more details, please have a look at part 1.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;ts&lt;/th&gt;
      &lt;th&gt;ms_played&lt;/th&gt;
      &lt;th&gt;conn_country&lt;/th&gt;
      &lt;th&gt;master_metadata_track_name&lt;/th&gt;
      &lt;th&gt;master_metadata_album_artist_name&lt;/th&gt;
      &lt;th&gt;master_metadata_album_album_name&lt;/th&gt;
      &lt;th&gt;reason_start&lt;/th&gt;
      &lt;th&gt;reason_end&lt;/th&gt;
      &lt;th&gt;shuffle&lt;/th&gt;
      &lt;th&gt;skipped&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;city&lt;/th&gt;
      &lt;th&gt;region&lt;/th&gt;
      &lt;th&gt;episode_name&lt;/th&gt;
      &lt;th&gt;episode_show_name&lt;/th&gt;
      &lt;th&gt;date&lt;/th&gt;
      &lt;th&gt;year&lt;/th&gt;
      &lt;th&gt;month&lt;/th&gt;
      &lt;th&gt;day&lt;/th&gt;
      &lt;th&gt;dow&lt;/th&gt;
      &lt;th&gt;hour&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;2013-10-09 20:24:30+00:00&lt;/td&gt;
      &lt;td&gt;15010&lt;/td&gt;
      &lt;td&gt;NL&lt;/td&gt;
      &lt;td&gt;Wild for the Night (feat. Skrillex &amp;amp; Birdy Nam...&lt;/td&gt;
      &lt;td&gt;A$AP Rocky&lt;/td&gt;
      &lt;td&gt;LONG.LIVE.A$AP (Deluxe Version)&lt;/td&gt;
      &lt;td&gt;unknown&lt;/td&gt;
      &lt;td&gt;click-row&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;2013-10-09&lt;/td&gt;
      &lt;td&gt;2013&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;20&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Genres retrieved from Spotify and the combined dataframe. We rename the genres columns from just a number 0-20 to 'genre_x' with x between 0 and 20, so they're easier to recognize. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;comb&lt;/code&gt; consists of &lt;code&gt;df&lt;/code&gt; + &lt;code&gt;genres_df&lt;/code&gt;, with the genre columns at the end.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# genres retrieved through Spotify API
&lt;/span&gt;&lt;span class="n"&gt;genres_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genres.csv&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;low_memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;genres_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genres_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genre_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;)})&lt;/span&gt;
&lt;span class="n"&gt;comb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;genres_df&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;ts&lt;/th&gt;
      &lt;th&gt;ms_played&lt;/th&gt;
      &lt;th&gt;conn_country&lt;/th&gt;
      &lt;th&gt;master_metadata_track_name&lt;/th&gt;
      &lt;th&gt;master_metadata_album_artist_name&lt;/th&gt;
      &lt;th&gt;master_metadata_album_album_name&lt;/th&gt;
      &lt;th&gt;reason_start&lt;/th&gt;
      &lt;th&gt;reason_end&lt;/th&gt;
      &lt;th&gt;shuffle&lt;/th&gt;
      &lt;th&gt;skipped&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;genre_11&lt;/th&gt;
      &lt;th&gt;genre_12&lt;/th&gt;
      &lt;th&gt;genre_13&lt;/th&gt;
      &lt;th&gt;genre_14&lt;/th&gt;
      &lt;th&gt;genre_15&lt;/th&gt;
      &lt;th&gt;genre_16&lt;/th&gt;
      &lt;th&gt;genre_17&lt;/th&gt;
      &lt;th&gt;genre_18&lt;/th&gt;
      &lt;th&gt;genre_19&lt;/th&gt;
      &lt;th&gt;genre_20&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;2013-10-09 20:24:30+00:00&lt;/td&gt;
      &lt;td&gt;15010&lt;/td&gt;
      &lt;td&gt;NL&lt;/td&gt;
      &lt;td&gt;Wild for the Night (feat. Skrillex &amp;amp; Birdy Nam...&lt;/td&gt;
      &lt;td&gt;A$AP Rocky&lt;/td&gt;
      &lt;td&gt;LONG.LIVE.A$AP (Deluxe Version)&lt;/td&gt;
      &lt;td&gt;unknown&lt;/td&gt;
      &lt;td&gt;click-row&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;2013-10-09 20:19:20+00:00&lt;/td&gt;
      &lt;td&gt;68139&lt;/td&gt;
      &lt;td&gt;NL&lt;/td&gt;
      &lt;td&gt;Buzzin'&lt;/td&gt;
      &lt;td&gt;OVERWERK&lt;/td&gt;
      &lt;td&gt;The Nthº&lt;/td&gt;
      &lt;td&gt;unknown&lt;/td&gt;
      &lt;td&gt;click-row&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h1&gt;
  
  
  Top genres
&lt;/h1&gt;

&lt;p&gt;In part 1, we have seen how many genres each song has and how their numbers are distributed. The next question then, naturally, is: What genres are they? So let's see! &lt;/p&gt;

&lt;p&gt;For the following analyses, remember that if I play 10 songs by Kanye, Kanye's genres will be present 10 times.&lt;/p&gt;

&lt;p&gt;To analyze the genres, I first create a dataframe that contains all of the genres and their counts. This will be handy in the near future.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;top_genres&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;genres_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value_counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;index&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can plot. Lets start with the total listens per genre. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fj6wbq42lh56a4ziw4bqh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fj6wbq42lh56a4ziw4bqh.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No big surprises here. My main music tastes are hip hop and electronic music, with main genres techno and drum and bass. However, for the latter two I mainly use Youtube, which hosts sets that Spotify does not have. So my Spotify is mainly dominated by hip hop and its related genres, like &lt;em&gt;rap&lt;/em&gt;, &lt;em&gt;hip hop&lt;/em&gt; and &lt;em&gt;pop rap&lt;/em&gt; (whatever that is? Drake maybe?). I expect many hip hop songs are also tagged as &lt;em&gt;pop&lt;/em&gt;, which would explain the high &lt;em&gt;pop&lt;/em&gt; presence, while I normally am not such a pop fan. Lets dive a bit deeper into this!&lt;/p&gt;

&lt;h2&gt;
  
  
  Top genres per song
&lt;/h2&gt;

&lt;p&gt;As a next step, let's verify which genres coincide with which other genres. This will test our hypothesis that &lt;em&gt;pop&lt;/em&gt; is used as a tag for &lt;em&gt;hip hop&lt;/em&gt;, but will also in general provide us with a better feeling of what genres are related to which other genres. &lt;/p&gt;

&lt;p&gt;For this,we loop over the rows and for each present genre, we put a 1 in that column, while also casting to &lt;code&gt;np.int8&lt;/code&gt;. This means that, instead of the normally 32 bits, we use 8 bits and thus safe some memory. Since we only wanna represent a binary state (present or not present), we could also use boolean. However, since we're doing arithmetic with it later, int8 will do. We fill the empty cells with 0. We only do this for the top 20 genres. This results in a dataframe with a column for each of the top 20 genres.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genre_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;)]].&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;new_row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;top_genres_20&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;new_row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_row&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;genre_presence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;genre_presence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genre_presence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fillna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;genre_presence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;hip hop&lt;/th&gt;
      &lt;th&gt;pop&lt;/th&gt;
      &lt;th&gt;pop rap&lt;/th&gt;
      &lt;th&gt;rap&lt;/th&gt;
      &lt;th&gt;edm&lt;/th&gt;
      &lt;th&gt;electro house&lt;/th&gt;
      &lt;th&gt;dance pop&lt;/th&gt;
      &lt;th&gt;tropical house&lt;/th&gt;
      &lt;th&gt;big room&lt;/th&gt;
      &lt;th&gt;brostep&lt;/th&gt;
      &lt;th&gt;bass trap&lt;/th&gt;
      &lt;th&gt;electronic trap&lt;/th&gt;
      &lt;th&gt;house&lt;/th&gt;
      &lt;th&gt;progressive electro house&lt;/th&gt;
      &lt;th&gt;progressive house&lt;/th&gt;
      &lt;th&gt;detroit hip hop&lt;/th&gt;
      &lt;th&gt;g funk&lt;/th&gt;
      &lt;th&gt;west coast rap&lt;/th&gt;
      &lt;th&gt;conscious hip hop&lt;/th&gt;
      &lt;th&gt;tech house&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;1&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
      &lt;td&gt;0&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now that we have this data, we can do a correlation analysis of when each genre coincides with what other genre. Now, because genre is a nominal data type, we cannot use the &lt;em&gt;standard correlation&lt;/em&gt;, which is the &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Pearson_correlation_coefficient" rel="noopener noreferrer"&gt;Pearson correlation coefficient&lt;/a&gt;&lt;/strong&gt;. Instead, we should use a metric that works with nominal values. I choose &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient" rel="noopener noreferrer"&gt;Kendall's tau&lt;/a&gt;&lt;/strong&gt; for this, due to its simplicity. Normally, Kendall's tau is meant for &lt;em&gt;ordinal&lt;/em&gt; values (variables that have an ordering). However, because we are working with a binary situation (genre is either present or not) represented by 0 and 1, I think this should still work. One other thing to note is that Kendall's tau is &lt;em&gt;symmetric&lt;/em&gt;, and this means &lt;code&gt;tau(a, b)&lt;/code&gt; is the same as &lt;code&gt;tau(b, a)&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Lets loop over all the combinations of the top 20 genres and compute their tau coefficient.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;scipy.stats&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;kendalltau&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;itertools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;
&lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;genre_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;genre_b&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre_presence&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;repeat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;kendalltau&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre_presence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;genre_a&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;genre_presence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;genre_b&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genre_a&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;genre_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genre_b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;genre_b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;tau&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tau&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;tau_values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tau_values&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;genre_a&lt;/th&gt;
      &lt;th&gt;genre_b&lt;/th&gt;
      &lt;th&gt;tau&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;1.000000&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;-0.040954&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If we plot this nicely, we get the following overview of "correlations". &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F341ioda3zp93hs1seecu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F341ioda3zp93hs1seecu.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We immediately can see some interesting clusters. We can see a strong tau between most of the electronic music genres, like &lt;em&gt;edm&lt;/em&gt;, &lt;em&gt;electro house&lt;/em&gt;, &lt;em&gt;bass trap&lt;/em&gt;, &lt;em&gt;big room&lt;/em&gt;, &lt;em&gt;brostep&lt;/em&gt; and &lt;em&gt;electronic trap&lt;/em&gt;. Then, looking at &lt;em&gt;hip hop&lt;/em&gt;, we can see very strong coefficients with &lt;em&gt;rap&lt;/em&gt; and &lt;em&gt;pop rap&lt;/em&gt;, neither of which are big suprises. My initial hypothesis that &lt;em&gt;pop&lt;/em&gt; would be correlated with hip hop has been debunked, though. &lt;em&gt;Pop&lt;/em&gt; seems to be more strongly related with &lt;em&gt;edm&lt;/em&gt; and some other electronic genres, and have a negative tau with hip hop related genres, like &lt;em&gt;hip hop&lt;/em&gt; (-0.29), &lt;em&gt;pop rap&lt;/em&gt; (-0.28) and &lt;em&gt;rap&lt;/em&gt; (-0.32). &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this overview, I think there are two interesting insights still:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A strong coefficient between &lt;em&gt;conscious hip hop&lt;/em&gt; and &lt;em&gt;west coast rap&lt;/em&gt;. I did not really expect this, but can likely be attributed to artists like Kendrick Lamar, who deal with social and political issues in their lyrics. Additionally, cities like Compton played a big role in west coast hip hop, and were often strongly related to their social and economical situation (Also for Kendrick Lamar).&lt;/li&gt;
&lt;li&gt;A strong coefficient between &lt;em&gt;G-funk&lt;/em&gt; and &lt;em&gt;Detroit hip hop&lt;/em&gt;. G-funk is a is a subgenre of hip hop that originated in the west coast, while Detroit hip hop, as the name says, comes from Detroit. A strong coefficient between &lt;em&gt;G-funk&lt;/em&gt; and &lt;em&gt;west coast rap&lt;/em&gt; might have been more expected. Interesting to see, but I won't dive deeper into these findings for now. &lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Monthly change in genres 📅
&lt;/h1&gt;

&lt;p&gt;This is a very interesting analysis in my opinion, but also one of the more challenging one. I've approached the problem the following way, given the data I had. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Count the frequency of each genre on a certain interval, monthly in this case.&lt;/li&gt;
&lt;li&gt;Divide these numbers by the total plays for those intervals, so we get a percentage of total plays of that month. This number means how much of the songs had that genre. This means that these percentages will not sum to one (or you know, they can, but they don't have to). &lt;/li&gt;
&lt;li&gt;Sort given these percentages and extract the monthly top 5.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Step 1&lt;/strong&gt;: count the frequency per interval. We don't do this just for the top n genres, but for &lt;strong&gt;all&lt;/strong&gt; genres. This, naturally, results in a lot of columns and a very wide dataframe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Step 1. Count all genre occurences per month.
&lt;/span&gt;&lt;span class="n"&gt;counters_per_month&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;unique_years&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;unique_months&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;unique&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;product&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unique_years&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;unique_months&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unique_years&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;unique_months&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;comb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;)].&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;genre&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genre_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;21&lt;/span&gt;&lt;span class="p"&gt;)]]:&lt;/span&gt;  &lt;span class="c1"&gt;# the genre columns are named '0' to '20'.
&lt;/span&gt;                &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;genre&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;genre&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;counters_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Put the &lt;code&gt;counts_per_month&lt;/code&gt; in a dataframe and calculate the total songs played per month.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counters_per_month&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    
&lt;span class="n"&gt;monthly_sum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;size&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2&lt;/strong&gt;: We then normalize all genre counts by the number of songs played in that time period.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# 2.Normalize all genre counts by the number of songs played in that time period. 
&lt;/span&gt;
&lt;span class="c1"&gt;# Select all columns except the time columns
&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;remove&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;monthly_sum&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iterrows&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To get a cleaner visual, we remove any data before August 2016.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;counts_per_genre_per_month_filtered&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2016&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
          &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts_per_genre_per_month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We now have a dataframe with 863 columns, which corresponds to 861 different genres. This dataframe has all the genres and what percentage of total plays they were present as a genre.  Keep in mind that an artist/song generally has more than one genre, so the sum of these fractions is not 1. This dataframe looks like this: &lt;/p&gt;

&lt;p&gt;Our data now looks as follows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;year&lt;/th&gt;
      &lt;th&gt;month&lt;/th&gt;
      &lt;th&gt;east coast hip hop&lt;/th&gt;
      &lt;th&gt;hip hop&lt;/th&gt;
      &lt;th&gt;pop&lt;/th&gt;
      &lt;th&gt;pop rap&lt;/th&gt;
      &lt;th&gt;rap&lt;/th&gt;
      &lt;th&gt;trap music&lt;/th&gt;
      &lt;th&gt;NaN&lt;/th&gt;
      &lt;th&gt;catstep&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;classical soprano&lt;/th&gt;
      &lt;th&gt;spanish hip hop&lt;/th&gt;
      &lt;th&gt;trap espanol&lt;/th&gt;
      &lt;th&gt;pop reggaeton&lt;/th&gt;
      &lt;th&gt;chinese hip hop&lt;/th&gt;
      &lt;th&gt;corrido&lt;/th&gt;
      &lt;th&gt;regional mexican pop&lt;/th&gt;
      &lt;th&gt;australian indigenous&lt;/th&gt;
      &lt;th&gt;witch house&lt;/th&gt;
      &lt;th&gt;ghettotech&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;16&lt;/th&gt;
      &lt;td&gt;2016&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;0.038760&lt;/td&gt;
      &lt;td&gt;0.449612&lt;/td&gt;
      &lt;td&gt;0.387597&lt;/td&gt;
      &lt;td&gt;0.488372&lt;/td&gt;
      &lt;td&gt;0.519380&lt;/td&gt;
      &lt;td&gt;0.069767&lt;/td&gt;
      &lt;td&gt;16.689922&lt;/td&gt;
      &lt;td&gt;0.007752&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;17&lt;/th&gt;
      &lt;td&gt;2016&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;0.055409&lt;/td&gt;
      &lt;td&gt;0.313984&lt;/td&gt;
      &lt;td&gt;0.343008&lt;/td&gt;
      &lt;td&gt;0.279683&lt;/td&gt;
      &lt;td&gt;0.337731&lt;/td&gt;
      &lt;td&gt;0.036939&lt;/td&gt;
      &lt;td&gt;16.469657&lt;/td&gt;
      &lt;td&gt;0.026385&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Step 3&lt;/strong&gt;: Sort given these values and extract the top 5. Unfortunately, the data is not in a shape that we can do that (to my knowledge at least), so we need to transform it a bit further by moving from a wide to a long data format and filtering out some values. &lt;/p&gt;

&lt;p&gt;The melting of the dataframe results in a single row per percentage per genre per timeunit. This makes it easier to plot with Altair. Furthermore, we create a datetime column from our year + month columns, which is also better for Altair to use.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;counts_per_genre_per_month_melted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;melt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;counts_per_genre_per_month_filtered&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;id_vars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
    &lt;span class="n"&gt;value_vars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;var_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;value_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;percentage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;counts_per_genre_per_month_melted&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;datetime&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;counts_per_genre_per_month_melted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
    &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;-&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;counts_per_genre_per_month_melted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
    &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%m-%Y&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Drop columns where either the genre or percentage is Nan. This reduces the number of rows even more, so that taking the n-largest later will be faster.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;counts_per_genre_per_month_melted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;counts_per_genre_per_month_melted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;subset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;percentage&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;genre&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which gives us:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;year&lt;/th&gt;
      &lt;th&gt;month&lt;/th&gt;
      &lt;th&gt;genre&lt;/th&gt;
      &lt;th&gt;percentage&lt;/th&gt;
      &lt;th&gt;datetime&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;2016&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;east coast hip hop&lt;/td&gt;
      &lt;td&gt;0.038760&lt;/td&gt;
      &lt;td&gt;2016-09-01&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;2016&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;east coast hip hop&lt;/td&gt;
      &lt;td&gt;0.055409&lt;/td&gt;
      &lt;td&gt;2016-10-01&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This looks great! But, there is one problem, and that is that we likely have way too many rows for Altair. We have almost 7k rows, while Altair's maximum is 5k. Not too bad, but we still need to remove a bunch of rows. But that is fine, since we're only interested in the top 5 of each month anyway. Using pandas' &lt;code&gt;.groupby&lt;/code&gt; and &lt;code&gt;.nlargest&lt;/code&gt;, we can extract this fairly easy. We extract those the indices of the remaining rows and index into the melted dataframe to only have the rows in the top 5 for each month left.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;top_genres_per_month_with_perc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;counts_per_genre_per_month_melted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts_per_genre_per_month_melted&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;percentage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;nlargest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;level_2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;:]&lt;/span&gt;

&lt;span class="n"&gt;top_genres_per_month_with_perc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;genre&lt;/th&gt;
      &lt;th&gt;percentage&lt;/th&gt;
      &lt;th&gt;datetime&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;year&lt;/th&gt;
      &lt;th&gt;month&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th rowspan="5"&gt;2016&lt;/th&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;0.519380&lt;/td&gt;
      &lt;td&gt;2016-09-01&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;0.488372&lt;/td&gt;
      &lt;td&gt;2016-09-01&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;0.449612&lt;/td&gt;
      &lt;td&gt;2016-09-01&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;0.387597&lt;/td&gt;
      &lt;td&gt;2016-09-01&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;td&gt;indie pop rap&lt;/td&gt;
      &lt;td&gt;0.131783&lt;/td&gt;
      &lt;td&gt;2016-09-01&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And we only have 145 rows left, so we can use it with Altair 😎.&lt;/p&gt;

&lt;p&gt;In the chart below, there is a lot going on. On the x-axis we have time while on the y-axis we have the normalized percentages of the top 5 genres. This means that for each month, the top 5 genres' percentages sum to represent 1. This might be hard to grasp, so I've put the non-normalized one next to this plot to make the difference clear. Some colors are used twice, but there is no color scheme available in Altair that supports more than 20 colors, so this will have to do for now 😉. You can hover over the bars to get details of those bars and click on legenda items to highlight a genre. &lt;/p&gt;

&lt;h2&gt;
  
  
  Top genres with percentages 📊
&lt;/h2&gt;

&lt;p&gt;The following plot is very nice when it's interactive, so please check out that one on my own &lt;a href="https://www.baukebrenninkmeijer.nl/posts/spotify-listening-history-analysis-part-2.html#Top-genres-with-percentages-%F0%9F%93%8A" rel="noopener noreferrer"&gt;website&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ft6454cocfkq7iftyk2qi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ft6454cocfkq7iftyk2qi.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are definitely some interesting things in theses plots. We can see some consistent attendees that we also saw in the most listened genres in general, so that's not a big surprise. For example, these include &lt;em&gt;rap&lt;/em&gt;, &lt;em&gt;edm&lt;/em&gt; and &lt;em&gt;hip hop&lt;/em&gt;. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Seasonal effects&lt;/strong&gt;: What is quite interesting is to see when the very common genres are not dominating the chart, like in December of 2016. Both in November and December of 2016 we see I was in a very strong Christmas mood, with &lt;em&gt;christmas&lt;/em&gt; covering 16% of songs in November and 51%(!) in December.  The top genres in December are &lt;em&gt;adult standard&lt;/em&gt; (whatever that may be), &lt;em&gt;easy listening&lt;/em&gt;, &lt;em&gt;christmas&lt;/em&gt; and &lt;em&gt;lounge&lt;/em&gt;. Those definitely are in the same segment, so it's not surprising that those other genres appear alongside Christmas in a heavy Christmas month. We do not see this seasonal effect in 2017 and 2018, but those years my Christmas music urge was just less, so this drop is explainable. Instead of Christmas, in December of 2018 &lt;em&gt;emo rap&lt;/em&gt; is in my top 5 genres 🤔. That might be interesting to look at in another blog post. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Electronic periods&lt;/strong&gt;: Something else that stands out is that there are &lt;em&gt;electronic music&lt;/em&gt; periods, like June, July and August of 2017 and January of 2018. However, both &lt;em&gt;edm&lt;/em&gt; and &lt;em&gt;electro house&lt;/em&gt; are present in essentially each month as high scorers, so I'm definitely a fan in general. But these peak months still stand out. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rise of Rap&lt;/strong&gt;:  The last thing that is interesting is probably the fact that &lt;em&gt;rap&lt;/em&gt; and &lt;em&gt;hip hop&lt;/em&gt; have almost exclusively been the top 2 from February 2018 to January 2019. This indicates a move away from the more electronic genres and more towards hip hop. A possible reason for this might be the move towards more set-based plays for electronic music, which are generally not on Spotify, but on platforms like YouTube. Otherwise, it might just be an actual preference shift. However, I do still listen to a lot of these types of music, so I suspect the former. Looking at data from 2019 and 2020 might give some insight in this. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top genres without percentages 🏆
&lt;/h2&gt;

&lt;p&gt;So we've seen how the genres relate to each other in terms of percentages per month. We can also see what the top genres are per month, but it can definitely still be improved. I really just want a list with the top 5 genres per month, ideally easily readable and pretty close to the example we had from Last.fm. &lt;/p&gt;

&lt;p&gt;As a reminder, that looked like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F76x0mpv4ep4qnm3pvd6u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F76x0mpv4ep4qnm3pvd6u.png" alt="Your top genres, plotted per week."&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can get a list of the top genres per month by grouping and then applying list on the Series.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;top_genres_per_month&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;top_genres_per_month_with_perc&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;genre&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;top_genres_per_month&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;year&lt;/th&gt;
      &lt;th&gt;month&lt;/th&gt;
      &lt;th&gt;genre&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;2016&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;[rap, pop rap, hip hop, pop, indie pop rap]&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;2016&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;[edm, pop, rap, electro house, hip hop]&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;We then create a numpy array from these values and apply them column by column to new dataframe columns.&lt;/p&gt;

&lt;p&gt;Until we finally arrive at the following dataframe. Now, still is pretty much what I wanted, so I'm happy with the result. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;year&lt;/th&gt;
      &lt;th colspan="4"&gt;2016&lt;/th&gt;
      &lt;th colspan="6"&gt;2017&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th colspan="9"&gt;2018&lt;/th&gt;
      &lt;th&gt;2019&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;month&lt;/th&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;th&gt;10&lt;/th&gt;
      &lt;th&gt;11&lt;/th&gt;
      &lt;th&gt;12&lt;/th&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;th&gt;7&lt;/th&gt;
      &lt;th&gt;8&lt;/th&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;th&gt;10&lt;/th&gt;
      &lt;th&gt;11&lt;/th&gt;
      &lt;th&gt;12&lt;/th&gt;
      &lt;th&gt;1&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;genre_1&lt;/th&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;adult standards&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;electro house&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;genre_2&lt;/th&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;easy listening&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;filter house&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;genre_3&lt;/th&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;adult standards&lt;/td&gt;
      &lt;td&gt;christmas&lt;/td&gt;
      &lt;td&gt;rock&lt;/td&gt;
      &lt;td&gt;dance-punk&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;electro house&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;electro house&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;genre_4&lt;/th&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;electro house&lt;/td&gt;
      &lt;td&gt;christmas&lt;/td&gt;
      &lt;td&gt;lounge&lt;/td&gt;
      &lt;td&gt;dance pop&lt;/td&gt;
      &lt;td&gt;electronic&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;conscious hip hop&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;brostep&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;genre_5&lt;/th&gt;
      &lt;td&gt;indie pop rap&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;easy listening&lt;/td&gt;
      &lt;td&gt;dutch hip hop&lt;/td&gt;
      &lt;td&gt;tropical house&lt;/td&gt;
      &lt;td&gt;alternative dance&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;west coast rap&lt;/td&gt;
      &lt;td&gt;conscious hip hop&lt;/td&gt;
      &lt;td&gt;electronic trap&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;electro house&lt;/td&gt;
      &lt;td&gt;electro house&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;emo rap&lt;/td&gt;
      &lt;td&gt;electro house&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Now, in my original post I colored this table to be easier interpretable, but Dev.to does not allow that :(. So please see the original post for the full version. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But, it looked like this!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2mzyrg5ldeb77g8aiovw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2mzyrg5ldeb77g8aiovw.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Better get the 🚒 cause this table is 🔥. &lt;/p&gt;

&lt;p&gt;This is really close to the Last.fm plot, apart from the lines between points that require 10 years of D3.js experience. We see some similar pattern to those in the earlier plot, but also can see some new insights. Here, we can focus some more on the anomalies that are present, like &lt;em&gt;indie pop rap&lt;/em&gt;, &lt;em&gt;dutch hip hop&lt;/em&gt;, &lt;em&gt;filter house&lt;/em&gt; and &lt;em&gt;conscious hip hop&lt;/em&gt;. These stand out more using this representation than before, which focused more on trends. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Insights&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More electronic peaks&lt;/strong&gt;: We can see that February 2017 was actually also a peak in electronic music, but due to similar colors in the previous plot this was a bit hidden. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pure hip hop periods&lt;/strong&gt;: Furthermore, we can also see there are some pure hip hop periods, like April and May of 2017, where EDM and electro house are not present at all, and we see more specific hip hop genres make way like &lt;em&gt;west coast rap&lt;/em&gt; and &lt;em&gt;conscious hip hop&lt;/em&gt;. &lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  In conclusion
&lt;/h1&gt;

&lt;p&gt;In part 2 we took a closer look at what genres I listen to and how that has developed over time. There were some very interesting insights, like the effects of holidays, and the change of music preference towards rap. We also recreated the plot from Last.fm, as close as possible at least. I'm quite happy with the outcome, but definitely have some newfound respect for Spotify analysts that have to do this for way more people. Although generalization also brings some advantages of course. Doing these analyses also is improving my skills with Pandas, because I have not previously worked that much with time data, so this is a great exercise. Also, having to look into the details of &lt;code&gt;.groupby&lt;/code&gt;, and how it operates on timeseries aggregates and what operations are possible were great. For instance, I learned you can do a groupby on a datetime index or column like so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;datetime-column&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and even multi-index this for month/year using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;datetime-column&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;datetime-column&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Which is very cool and way cleaner than what I used! But I'm getting sidetracked.&lt;/p&gt;

&lt;p&gt;Rounding off; thank you for reading and sticking with me! I'm very curious what results Part 3 will bring. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics for part 3:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An analysis of musical features, like energy, danceability and acousticness. Those are numeric values and thus allow for some different visualizations then all of the discrete values of this blogpost. &lt;/li&gt;
&lt;li&gt;Which songs do I listen to that are emo rap. This is probably quite a small point of research, but still I'm quite curious.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you liked this blogpost, don't hesitate to reach out to me on &lt;a href="https://www.linkedin.com/in/bauke-brenninkmeijer-40143310b" rel="noopener noreferrer"&gt;linkedin&lt;/a&gt; or &lt;a href="https://twitter.com/Bauke_B" rel="noopener noreferrer"&gt;twitter&lt;/a&gt;. 😊&lt;/p&gt;

&lt;p&gt;
    &lt;a href="https://github.com/Baukebrenninkmeijer" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fgithub%2Ffollowers%2FBaukebrenninkmeijer.svg%3Flabel%3DGitHub%26style%3Dsocial" alt="GitHub"&gt;&lt;/a&gt;
    &lt;a href="https://www.linkedin.com/in/bauke-brenninkmeijer-40143310b" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2FLinkedIn--_.svg%3Fstyle%3Dsocial%26logo%3Dlinkedin" alt="LinkedIn"&gt;&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>python</category>
      <category>jupyter</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Analyzing my Spotify listening history 🎵 - Part 1</title>
      <dc:creator>Bauke Brenninkmeijer</dc:creator>
      <pubDate>Wed, 21 Oct 2020 22:10:14 +0000</pubDate>
      <link>https://dev.to/baukebrenninkmeijer/analyzing-my-spotify-listening-history-part-1-3jl4</link>
      <guid>https://dev.to/baukebrenninkmeijer/analyzing-my-spotify-listening-history-part-1-3jl4</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.baukebrenninkmeijer.nl/posts/spotify-listening-history-analysis-part-2.html" rel="noopener noreferrer"&gt;Link to Part 2&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post was originally written on my personal website, where the charts and plots are interactive. If you are missing that, check out &lt;a href="https://www.baukebrenninkmeijer.nl/posts/spotify-listening-history-analysis-part-1.html" rel="noopener noreferrer"&gt;the original&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I like to have everything in my life tracked in some way. Preferably, knowingly (Looking at you, Facebook), cause it allows you to analyze the data and find interesting things (Might be related with becoming a data scientist)! I've always been a fan of the features provided by &lt;a href="https://www.last.fm" rel="noopener noreferrer"&gt;Last.fm&lt;/a&gt; to track you listening behaviour across apps and platforms. It allows you to see stuff like your favorite artists per month, or your affinity with certain genres over time like in the image below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F76x0mpv4ep4qnm3pvd6u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F76x0mpv4ep4qnm3pvd6u.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Buuuuuut&lt;/em&gt;, like Last.fm, most of these analyses are paid completely or partly. In the case of Last.fm, you get this plot for free but anything more will cost you some paper. I'm Dutch, so let's see if we can do it ourselves!&lt;/p&gt;

&lt;p&gt;I wanted to have my listening history, and currently there is an &lt;a href="https://developer.spotify.com/documentation/web-api/reference/player/get-recently-played/" rel="noopener noreferrer"&gt;API call&lt;/a&gt; that provides that functionality. However, I wanted to do this at the start of 2019 (Last year was pretty busy, so I didn't get around to doing this until now 😅) and this wasn't available back then, or at least I couldn't find it. Spotify, like many other companies, has an option to download your personal information. Unfortunately, this data only contained data for three months (they upped it to a year now, which is great!). &lt;/p&gt;

&lt;p&gt;But, given this limitation, &lt;strong&gt;the only way I could think of to get this was to ask Spotify for my personal data&lt;/strong&gt;. Under the GDPR, they are required to provide this information, so I thought this had a good shot. Well, after e-mailing back and forth a whole bunch of times, eventually I got in touch with the Data Privacy Office and they provided me with my complete listening history! &lt;/p&gt;

&lt;p&gt;So that's the data that we'll be working with. Like I said, I requested the data in early 2019, so my history goes from my beginning of Spotify (ca. 2013) until then. So lets see what we're dealing with. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Data ✨
&lt;/h2&gt;

&lt;p&gt;I received one main file from spotify called &lt;code&gt;EndSong.json&lt;/code&gt; which had json items as follows. In total, I got 39,229 songs played, which is quite a lot and definitely enough to do some interesting things with!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"2013-10-09 20:03:57 UTC"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"username"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"xxxxxxxxxx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"xxxxxxx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ms_played"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"5969"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"conn_country"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"NL"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"ip_addr_decrypted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"xx.xx.xx.xx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"user_agent_decrypted"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"xxxxxxxxxxx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"master_metadata_track_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"You Make Me"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"master_metadata_album_artist_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Avicii"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"master_metadata_album_album_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"You Make Me"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reason_start"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"click-row"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reason_end"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"click-row"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"shuffle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"skipped"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"offline"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"offline_timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"incognito_mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"metro_code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"longitude"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"latitude"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For our analysis, we're gonna use the ol' trusty &lt;a href="https://pandas.pydata.org/" rel="noopener noreferrer"&gt;Pandas&lt;/a&gt;. The data is in the json-lines format, so we use the python &lt;a href="https://pypi.org/project/json-lines/" rel="noopener noreferrer"&gt;json-lines&lt;/a&gt; package to read our data. We'll also drop some useless columns and convert the timestamp column to a python datetime object. Furthermore, we use the UTF-8 encoding while reading our data, to support tokens that would otherwise be malformed like the ë character. Lastly, we also create separate columns for many of our time attributes like year, month and day, since this makes it easy for filtering during plotting.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;json_lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data/EndSong.json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user_agent_decrypted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;incognito_mode&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ip_addr_decrypted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;year&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;month&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;month&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;day&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;day&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dow&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dayofweek&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hour&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;hour&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;ts&lt;/th&gt;
      &lt;th&gt;ms_played&lt;/th&gt;
      &lt;th&gt;conn_country&lt;/th&gt;
      &lt;th&gt;master_metadata_track_name&lt;/th&gt;
      &lt;th&gt;master_metadata_album_artist_name&lt;/th&gt;
      &lt;th&gt;master_metadata_album_album_name&lt;/th&gt;
      &lt;th&gt;reason_start&lt;/th&gt;
      &lt;th&gt;reason_end&lt;/th&gt;
      &lt;th&gt;shuffle&lt;/th&gt;
      &lt;th&gt;skipped&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;city&lt;/th&gt;
      &lt;th&gt;region&lt;/th&gt;
      &lt;th&gt;episode_name&lt;/th&gt;
      &lt;th&gt;episode_show_name&lt;/th&gt;
      &lt;th&gt;date&lt;/th&gt;
      &lt;th&gt;year&lt;/th&gt;
      &lt;th&gt;month&lt;/th&gt;
      &lt;th&gt;day&lt;/th&gt;
      &lt;th&gt;dow&lt;/th&gt;
      &lt;th&gt;hour&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;2013-10-09 20:24:30+00:00&lt;/td&gt;
      &lt;td&gt;15010&lt;/td&gt;
      &lt;td&gt;NL&lt;/td&gt;
      &lt;td&gt;Wild for the Night (feat. Skrillex &amp;amp; Birdy Nam...&lt;/td&gt;
      &lt;td&gt;A$AP Rocky&lt;/td&gt;
      &lt;td&gt;LONG.LIVE.A$AP (Deluxe Version)&lt;/td&gt;
      &lt;td&gt;unknown&lt;/td&gt;
      &lt;td&gt;click-row&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;2013-10-09&lt;/td&gt;
      &lt;td&gt;2013&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;2013-10-09 20:19:20+00:00&lt;/td&gt;
      &lt;td&gt;68139&lt;/td&gt;
      &lt;td&gt;NL&lt;/td&gt;
      &lt;td&gt;Buzzin'&lt;/td&gt;
      &lt;td&gt;OVERWERK&lt;/td&gt;
      &lt;td&gt;The Nthº&lt;/td&gt;
      &lt;td&gt;unknown&lt;/td&gt;
      &lt;td&gt;click-row&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;2013-10-09&lt;/td&gt;
      &lt;td&gt;2013&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;td&gt;2013-10-09 20:21:54+00:00&lt;/td&gt;
      &lt;td&gt;23643&lt;/td&gt;
      &lt;td&gt;NL&lt;/td&gt;
      &lt;td&gt;Blue&lt;/td&gt;
      &lt;td&gt;Gemini&lt;/td&gt;
      &lt;td&gt;Blue EP&lt;/td&gt;
      &lt;td&gt;unknown&lt;/td&gt;
      &lt;td&gt;click-row&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;2013-10-09&lt;/td&gt;
      &lt;td&gt;2013&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;20&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;td&gt;2013-10-09 20:20:29+00:00&lt;/td&gt;
      &lt;td&gt;68063&lt;/td&gt;
      &lt;td&gt;NL&lt;/td&gt;
      &lt;td&gt;Blue&lt;/td&gt;
      &lt;td&gt;Gemini&lt;/td&gt;
      &lt;td&gt;Blue EP&lt;/td&gt;
      &lt;td&gt;unknown&lt;/td&gt;
      &lt;td&gt;click-row&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;2013-10-09&lt;/td&gt;
      &lt;td&gt;2013&lt;/td&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;9&lt;/td&gt;
      &lt;td&gt;2&lt;/td&gt;
      &lt;td&gt;20&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h1&gt;
  
  
  Yearly &amp;amp; Monthly behaviour 📆
&lt;/h1&gt;

&lt;p&gt;One of the first things that might be interesting to see is how my usage of Spotify has changed over the years. For this, we can easily plot the number of songs player by year and by month. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpvallu8ndwys9v58ncqg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fpvallu8ndwys9v58ncqg.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the plots above you see my total songs listened. It immediately becomes clear that I got my data in early 2019, given the drop in 2019 and lack of data in 2020. But what is interesting is the steady increasing line the previous years. It shows that I slowly started using spotify more and more. The start coincides with when I started paying for Spotify as well, which is not very surprising (Yay for no ads and song selection 🤗). &lt;/p&gt;

&lt;p&gt;When looking at the number of songs per month, we can still see a decline in listening activity since the spike at October 2018. The peak that ranges from September 2018 to November 2018 can be explained by me starting a new internship where I was playing spotify while working the whole day. In October my total songs listened more than doubled compared to only two months earlier (1615 to 3273 songs played).&lt;/p&gt;

&lt;p&gt;Furthermore, we can see that I also used Spotify for a short while in 2016, but stopped using it again for about a year. Then I picked it up again in 2017 and never stopped using it afterwards. This is likely because I was using the web version of Spotify for a while, where you can use adblock to block the ads. But not being able to use it on your phone reduced the utility of Spotify pretty significantly, so I switched back over to my previous way, which was a combination of Poweramp and Google Play Music. &lt;/p&gt;

&lt;h1&gt;
  
  
  Daily behaviour 🕺
&lt;/h1&gt;

&lt;p&gt;Well, I'm already learning a lot about Altair, cause creating this plot in its current form easily took three hours. Altair does not like it when you aggregate a value in several places. &lt;em&gt;But&lt;/em&gt; the result is also quite a nice visual. I plotted the daily distribution per hours per year. Now, the value is the sum of the whole year, so it's no wonder that the differences are really similar to what we say in the yearly distribution. More interesting would be the percentage per hour per year, which would tell me something about my listening behaviour throughout the years. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;The yearly visuals only show 2016 till 2019, because the others years don't have enough data.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4ygf87t8n3a0oqlz8ti6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F4ygf87t8n3a0oqlz8ti6.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are several noteworthy things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;In 2016&lt;/strong&gt;, there was a big spike between 13:00 and 14:00. 2016 is split between my third and fourth year at university, of which in the fourth year I also was on the board of the &lt;a href="https://dorans.nl/" rel="noopener noreferrer"&gt;e-sports association&lt;/a&gt;. I just barely didn't have enough credits to get my bachelor's degree in my third year, so my fourth year was pretty empty. The combination likely contributed to many days where I had lunch and then closed myself of with music, to work on association matters. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In 2016&lt;/strong&gt;, there is a big spike at 9:00. This makes sense, because that was always the time I was cycling to my university. Over the years, I started listening to podcasts more, which is why you can see the 9:00 value decline over 4 years. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In 2018 and 2019&lt;/strong&gt;, my listening during evening hours decreased quite significantly. Earlier, I had a spike at 22:00 but this completely faded during the first month of 2019. What happened? Not sure, to be frank. It might be that I had more nights planned with friends? &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;In 2017 and 2018&lt;/strong&gt;, I somehow play 4% of my music daily before 7:00. Now, this is essentially impossible since I almost never get up before 7. I'm not sure why this is shown to be the case. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Tip: I've since learned that you can so things like define variables and aggregate values in Altair. An example can be found on their &lt;a href="https://altair-viz.github.io/gallery/percentage_of_total.html" rel="noopener noreferrer"&gt;website&lt;/a&gt;. However, did not go back and redo the analyses using that. That's for another time. 😉&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Weekly behaviour
&lt;/h1&gt;

&lt;p&gt;To get a clear view of weekly listening behaviour, we can create a heatmap with the hours and day of week, with the color indicating the number of listens on that hour. We can see that the working week made a lot of difference, with listening mainly focused on Tuesday, Wednesday, Thursday and Friday, with a little bit as well on Monday afternoon. We all need to wake up a bit first on Monday morning ☕. &lt;/p&gt;

&lt;p&gt;Interestingly enough, we see some hours that have no plays at all like Monday morning 2:00-3:00. I'm a bit skeptic that I &lt;em&gt;never&lt;/em&gt; plays anything at all there, but I don't have an explanation for it. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkxu3wr31uuh26obhudib.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkxu3wr31uuh26obhudib.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Genres 🎧
&lt;/h1&gt;

&lt;p&gt;Now that we have some insight into my listening behaviour, we can analyze what I listen to a bit more closely. For example, what genres do I listen to most and how do these change? Here we get a bit closer to the visualization we got from Last.fm.&lt;/p&gt;

&lt;p&gt;However, before we can do that, we need to get the genres associated with our songs. This was not included in the data Spotify sent over, so we need to use their API to get this information. Spotify does not apply categories per song, but rather per artist. Internally, Spotify uses special URIs to indicate different concepts like artists, songs and albums. For example, a track can be indicated with &lt;code&gt;spotify:track:6rqhFgbbKwnb9MLmUQDhG6&lt;/code&gt;. &lt;br&gt;
That relates for example to &lt;iframe src="https://open.spotify.com/embed/track/6rqhFgbbKwnb9MLmUQDhG6" width="100%" height="80px"&gt;
&lt;/iframe&gt;
.&lt;/p&gt;

&lt;p&gt;These URIs refer to a specific object, whereas the artist in the data I received does not. So we need to use the Spotify search function to retrieve the correct Spotify object of an artist, and then we can retrieve the genres from there. Now, either the data they sent is somewhat corrupted or their music management is a bit lackluster, because there were still quite some artists without an Spotify URI and/or without any defined genres. The latter makes sense, since this takes a lot of work by Spotify. &lt;/p&gt;

&lt;p&gt;Let's have a short look at the data. I'm showing the first two rows (so the genres of the first two songs) and all the columns. The columns just have a number, but indicate the first to twenty-first genre of each song. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;th&gt;2&lt;/th&gt;
      &lt;th&gt;3&lt;/th&gt;
      &lt;th&gt;4&lt;/th&gt;
      &lt;th&gt;5&lt;/th&gt;
      &lt;th&gt;6&lt;/th&gt;
      &lt;th&gt;7&lt;/th&gt;
      &lt;th&gt;8&lt;/th&gt;
      &lt;th&gt;9&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;11&lt;/th&gt;
      &lt;th&gt;12&lt;/th&gt;
      &lt;th&gt;13&lt;/th&gt;
      &lt;th&gt;14&lt;/th&gt;
      &lt;th&gt;15&lt;/th&gt;
      &lt;th&gt;16&lt;/th&gt;
      &lt;th&gt;17&lt;/th&gt;
      &lt;th&gt;18&lt;/th&gt;
      &lt;th&gt;19&lt;/th&gt;
      &lt;th&gt;20&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;0&lt;/th&gt;
      &lt;td&gt;east coast hip hop&lt;/td&gt;
      &lt;td&gt;hip hop&lt;/td&gt;
      &lt;td&gt;pop&lt;/td&gt;
      &lt;td&gt;pop rap&lt;/td&gt;
      &lt;td&gt;rap&lt;/td&gt;
      &lt;td&gt;trap music&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;1&lt;/th&gt;
      &lt;td&gt;catstep&lt;/td&gt;
      &lt;td&gt;complextro&lt;/td&gt;
      &lt;td&gt;edm&lt;/td&gt;
      &lt;td&gt;electro house&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Because we added all genres for each artist as a new column, we have quite a wide dataframe. The artist with the most genres has 21(!) genres. Let's see who that is. &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;/th&gt;
      &lt;th&gt;ts&lt;/th&gt;
      &lt;th&gt;ms_played&lt;/th&gt;
      &lt;th&gt;conn_country&lt;/th&gt;
      &lt;th&gt;master_metadata_track_name&lt;/th&gt;
      &lt;th&gt;master_metadata_album_artist_name&lt;/th&gt;
      &lt;th&gt;master_metadata_album_album_name&lt;/th&gt;
      &lt;th&gt;reason_start&lt;/th&gt;
      &lt;th&gt;reason_end&lt;/th&gt;
      &lt;th&gt;shuffle&lt;/th&gt;
      &lt;th&gt;skipped&lt;/th&gt;
      &lt;th&gt;...&lt;/th&gt;
      &lt;th&gt;11&lt;/th&gt;
      &lt;th&gt;12&lt;/th&gt;
      &lt;th&gt;13&lt;/th&gt;
      &lt;th&gt;14&lt;/th&gt;
      &lt;th&gt;15&lt;/th&gt;
      &lt;th&gt;16&lt;/th&gt;
      &lt;th&gt;17&lt;/th&gt;
      &lt;th&gt;18&lt;/th&gt;
      &lt;th&gt;19&lt;/th&gt;
      &lt;th&gt;20&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;th&gt;32948&lt;/th&gt;
      &lt;td&gt;2018-10-26 21:01:59+00:00&lt;/td&gt;
      &lt;td&gt;255626&lt;/td&gt;
      &lt;td&gt;NL&lt;/td&gt;
      &lt;td&gt;Grounded&lt;/td&gt;
      &lt;td&gt;Pavement&lt;/td&gt;
      &lt;td&gt;Crooked Rain Crooked Rain&lt;/td&gt;
      &lt;td&gt;clickrow&lt;/td&gt;
      &lt;td&gt;trackdone&lt;/td&gt;
      &lt;td&gt;False&lt;/td&gt;
      &lt;td&gt;NaN&lt;/td&gt;
      &lt;td&gt;...&lt;/td&gt;
      &lt;td&gt;indie pop&lt;/td&gt;
      &lt;td&gt;indie rock&lt;/td&gt;
      &lt;td&gt;lo-fi&lt;/td&gt;
      &lt;td&gt;modern rock&lt;/td&gt;
      &lt;td&gt;new wave&lt;/td&gt;
      &lt;td&gt;noise pop&lt;/td&gt;
      &lt;td&gt;noise rock&lt;/td&gt;
      &lt;td&gt;post-punk&lt;/td&gt;
      &lt;td&gt;rock&lt;/td&gt;
      &lt;td&gt;slow core&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It's some artist called Pavement! I've no idea who that is, but still interesting to see. The artist seems to be in the indie rock segment, where there are many subgenres, so it's not that surprising. I've heard that metal has a similar amount of subgenres, so it would be cool to do this analysis for a metal fan 🤘. &lt;/p&gt;

&lt;p&gt;But, clearly I am not in that segment. Lets see how many genres an average artist of mine has. We'll exclude artists with zero genres. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F57i3df0l77uab56eo1du.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F57i3df0l77uab56eo1du.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we take the percentage of the artists with three or fewer genres, we see this is 52%. This is quite high, and means many people are pretty specific with regards to what genres they fall into for Spotify. We see quite a long tail distribution, with only 27% having more than 5 genres specified and only 2.8% more then 10 genres!&lt;/p&gt;
&lt;h1&gt;
  
  
  In conclusion
&lt;/h1&gt;

&lt;p&gt;We have done a pretty thorough analysis of my listening history on Spotify. We evaluated the high level listening behaviour on a monthly and yearly basis. We have also seen my daily listening behaviour and how it has changed throughout the years. We also started on the analysis of the genres, which we will continue in part 2!&lt;/p&gt;

&lt;p&gt;It has been really interesting to see how my preferences with regards to music over these years, and it is definitely contributing to my 'have everything tracked' KPIs. Since all of Spotify's data is accessible through the API, I might consider making a dashboard for these insights that updates automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Furylgyrodhmwmz4ctwci.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Furylgyrodhmwmz4ctwci.gif" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unfortunately, we didn't get to see the really cool stuff in this post. Things like recreation of the last.fm image and the changes in genres over time are very interesting, and I would have loved to already be able to show those. Please check out part 2 for that. I'll add links to that here as soon as that's out. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Topics covered in part 2:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What are my top genres?&lt;/li&gt;
&lt;li&gt;Correlation between genres.&lt;/li&gt;
&lt;li&gt;How have my genres changed over time?&lt;/li&gt;
&lt;li&gt;Recreation of the Last.fm image. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Learnings&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This blogpost has been a huge learning experience for me. It was my first time using Fastpages. It was my first time writing a blogpost in a jupyter notebook as well, and it was also my first time using Altair! All of those experiences were quite positive, and I especially like getting more familiar with Altair. Having a Grammar of Graphics tool in your toolbelt is an extremely valuable thing in the world of data science, although you might not use it on a daily basis. &lt;/p&gt;

&lt;p&gt;If you liked this blogpost, don't hesitate to reach out to me on &lt;a href="https://www.linkedin.com/in/bauke-brenninkmeijer-40143310b" rel="noopener noreferrer"&gt;linkedin&lt;/a&gt; or &lt;a href="https://twitter.com/Bauke_B" rel="noopener noreferrer"&gt;twitter&lt;/a&gt;. 😊&lt;/p&gt;

&lt;p&gt;
    &lt;a href="https://github.com/Baukebrenninkmeijer" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fgithub%2Ffollowers%2FBaukebrenninkmeijer.svg%3Flabel%3DGitHub%26style%3Dsocial" alt="GitHub"&gt;&lt;/a&gt;
    &lt;a href="https://www.linkedin.com/in/bauke-brenninkmeijer-40143310b" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2FLinkedIn--_.svg%3Fstyle%3Dsocial%26logo%3Dlinkedin" alt="LinkedIn"&gt;&lt;/a&gt;
&lt;/p&gt;

</description>
      <category>python</category>
      <category>jupyter</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
