Martin Heinz

Posted on Aug 14, 2020 • Originally published at martinheinz.dev

History of Epidemics in a Single Chart

#javascript #d3js #datascience #showdev

COVID-19 is the current flavour of the month for data visualizations and everybody just wants to use this one dataset. In this article however, we will take a step back for a second and take a look at bigger picture - the whole history of worlds epidemics and pandemics. To do so, we will use single interactive chart called horizontal bar chart.

Full chart can be found at https://martinheinz.github.io/charts/horizontal-bar-chart/. It contains list of almost 250 epidemics that happened between 1200 BC and 2020. Each bars represents one epidemic. Horizontal axis shows time, in year, while vertical shows the epidemic.

You can hover over each bars to see name, time span and death toll. To see further description of epidemic, hover over its label on the left. You can use fields in the top to drill down to specific time frame. You can also sort bars on the chart by total epidemic time span, start year or death toll.

The Dataset

Based on the title and the topic, the dataset for this article is history or a list of worlds epidemics. The most complete list with the most amount of accompanying data which I was able to find was from Wikipedia article here.

This data set is really just a big table of all the plagues, epidemics or even minor outbreaks. As a quick sample, here is one row:

Event	Date	Location	Disease	Death Toll
1918 influenza pandemic ('Spanish flu')	1918–1920	Worldwide	Influenza A virus subtype H1N1	17–100 million

To be able to make any use of this data in visualization, we will need it in little more computer friendly format, which is CSV. I generated this CSV using simple Python script which you can find here. All this script does is scrape the table from Wikipedia using BeautifulSoup, retrieves all the values from it and writes it into CSV file.

And here is also example row of parsed data:

title,date,span,location,disease,toll
1918 flu pandemic,,1918-1920,Worldwide,Influenza A virus subtype H1N1  Spanish flu virus,50000000

As for any alternative sources of data - I wasn't really able to find any exhaustive list of this kind with enough information for each entry. What I was able to find on the internet was mostly just "Top Ten List of Epidemics" or just lots of COVID-19 data. In case you know of better dataset than this one, please let me know!

Horizontal Bar Chart

Horizontal bar chart is really just a normal bar chart turned 90 degrees - that is chart with data categories on vertical axis and data values on horizontal axis. It has many advantages over normal bar chart though.

One very simple advantage is that by putting category labels on vertical axis, you gain much more space to display them. Another one is ability to display time - which is naturally shown on horizontal axis, which you can't do on normal bar chart.

Next few advantages stem from the way we will use the chart in this particular visualization. As you already saw from demo above, the individual bars aren't showing just one value. They are displaying both length (in years) as well as actual time frame. Unlike with basic bar chart - the individual bars aren't all attached to horizontal axis, but rather use the starting (and also ending) point of each bar to show extra information.

On top of that, we also use tooltips to communicate more data as well as color palette to show it in easy to understand way. It's also important to mention that choice of color palette is quite important, as it can make the chart very hard to read if it's non-intuitive. In general it's the safest to use high contrast, divergent cool-warm palettes such the ones described in this article.

Code

The code needed for this visualization is quite lengthy and most of it is not so interesting, so rather than going over every single line of code, I will just show and explain the most important parts. If you want to dive into details of the code then head over to https://github.com/MartinHeinz/charts/blob/master/horizontal-bar-chart/horizontal-bar-chart.js or check out my previous article about Bee Swarm chart where I show more details about code and D3.js.

Filtering

The dataset displayed on this chart is quite big - it has almost 250 records, which might be hard to read when shown all at once. Therefore filtering options are essential for user experience. The GUI allows user to filter based on time range - that is - start and end year of epidemics as well as an option to filter out the epidemics with unknown death toll.

Both of these filters require some manipulation of dataset as well as axes. Iterating over list of rows and removing/adding ones that fit filter criteria is easy enough. How do we update the chart when we have the updated data ready, though?

First step is to update scales for both X and Y axis. Each of those scales have domain which is mapped to a range. In our case - for X axis we map years to width (range) of our chart:

xScale = d3.scaleLinear()
           .domain([
               d3.min(dataSet, function(d) { return d.start; }),
               d3.max(dataSet, function(d) { return d.end; })
           ])
           .range([margin.left, width - margin.right])

As the code snippet above shows, we take minimum start and maximum end year from all rows in our dataset and map it to size of chart in browser window. With default settings on this chart this ends up being years [1875-2020] projected onto pixels [250, 980].

Similar case applies to vertical (Y) axis, where we need to map titles of all the epidemics to individual ticks:

yScale = d3.scaleBand()
           .domain(dataSet.map(function(d) { return d.title; }))
           .range([margin.top, height - margin.bottom])
           .paddingInner(0.4)
           .paddingOuter(0.4);

Here, instead of linear scale we use band scale which is better for categorical or ordinal data like titles. The domain here consists of list of all titles - again - projected onto size (height) of the chart. As you can see above we also add padding to the scale to avoid overlapping of the titles. Part of our chart would end up with mapping like this:

"1875 Fiji measles outbreak": 15.688811188811144
"1875-1876 Australia scarlet fever epidemic": 26.89510489510485
"1876 Ottoman Empire plague epidemic": 38.10139860139856
"1878 New Orleans yellow fever epidemic": 49.307692307692264
"1878 Mississippi Valley yellow fever epidemic": 60.51398601398597

With scales and axes updated we now need to take care of the most important part of the chart - the bars. With D3.js, this process has 2 parts to it, consisting of so-called enter and exit selections. First we remove existing data from the chart with exit selection:

svg.selectAll(".bars")  // Select all elements with CSS class .bars
   .data([])  // Set bars data to "nothing" (empty array)
   .exit()  // Apply exit selection
   .remove();  // Remove the data that was previously in the selection

As described in the comments, the code above starts by querying all the HTML elements with class .bars. Next, it binds empty array as a dataset to this selection of HTML elements. On third line it applies exit selection - which simply put - removes all the data from the selection that was previously there and shouldn't be there anymore (we just bound empty array to it, so it just removes everything). Finally, last line wipes the data.

After removing data, we also need to put something back to be displayed. That's where enter selection comes in:

bars = svg.selectAll(".bars")
          .data(dataSet)
          .enter()
          .append("rect");

Once again we select same HTML elements as before. This time however, we bind our filtered dataset to the selection instead of empty array and perform enter selection which is just inverse operation to the exit. On the last line we use append function which, well... appends 1 rect element for each data entry to the SVG, creating all our little bars. At this point we have all the bars, with all the data, but they don't have any attributes like width, position, color, etc. But, we will fix that in the next section!

Note: The explanation enter, exit and append here is veeery brief and I recommend checking out this article by Jonathan Soma for more context.

Animations

Just showing the updated data is no fun. So, to make it little more enjoyable and visually pleasing for viewer, we will add a few transitions for these data updates.

Same as when we updated the data, we will start with X axis. This is how we create its animation/transition:

svg.select(".x.axis")          // Select elements with CSS classes .x and .axis
   .transition()               // Start transition
   .duration(1000)             // Make it last 1 second
   .call(
       d3.axisBottom(xScale)
         .ticks(15, ".0f")
   );

The snippet above might not be clear to you if you are not used to D3.js code, so let's start by saying what the transition actually is - transition in D3.js is a form of animation where starting point of animation is current state of DOM and ending point is collection of styles, attributes and properties you specify.

With that, let's go over the code line-by-line. First, we select element with .x and .axis CSS classes, which in this case is the horizontal axis - this is starting point of our animation. Next, we start the transition and set its duration to 1 second. After that we use .call function which takes the ending point of our transition as parameters, which in this case is bottom axis created from xScale defined in previous section with addition of 15 vertical ticks. Rest is D3.js magic.

Now, onto the Y axis. After understanding previous piece of code, this one is easy, as it is pretty much the same thing:

svg.select(".y.axis")
   .transition()
   .duration(1000)
   .call(
        d3.axisLeft(yScale)
   );

All we changed to make this work for Y axis is the CSS class (.y) and we swapped the axisBottom for axisLeft and that's it, we have Y axis animated and rendered:

As with previous section, we will finish with all the little bars. To animate them all, we will take the same approach as with previous transitions, except in this case we will not use .call but rather each attr function directly:

bars.transition()
    .duration(1000)
    .attr("x", function(d) { return xScale(d.start); })
    .attr("y", function(d) { return yScale(d.title); })
    .attr("width", function(d) { return xScale(d.end) - xScale(d.start);})
    .attr("fill", function(d) {
        return colors(d.start - d.end);
    });

This might seem complicated, but it really isn't. What we need to realize is that this is not a single animation but rather one animation for each bar. For each of them, we want ending point of transition to be a bar with x coordinate equal to its d.start, its y coordinate equal to Y coordinate of matching title on Y axis and its width equal to difference between its d.end and d.start. As for the last attribute - we set its color based on its length (d.start - d.end) which is mapped to predefined color scale.

Sorting

At this point we could leave the chart as is and it would be just fine, but we can give user different (and possibly more readable) view of the data by adding sorting options. 3 sorting buttons at the top allow user to sort by total span of epidemic, its start year and total death toll. Let's see how to implement this:

function drawSort(sort) {

    if(sort === "sortTotalDeathToll") {
        dataSet.sort(function(a, b) {
            return d3.descending(a.toll , b.toll);
        });
    }
    else if(sort === "sortStartYear") {
        dataSet.sort(function(a, b) {
            return d3.ascending(a.start , b.start);
        });
    }
    else if(sort === "sortTotalSpan") {
        dataSet.sort(function(a, b) {
            return d3.descending(a.span , b.span);
        });
    }

    yScale.domain(dataSet.map(function(d) { return d.title; }));

    // Perform bars transition (update Y attribute)
    // Perform Y axis transition
}

All the work is done by single function called drawSort which listens to click events from buttons mentioned above. Based on the button clicked it decides which sorting to apply. In each case it sorts the dataset in ascending/descending order based on respective attribute of each record. This sorted dataset is then applied to vertical scale to update its domain in the same way as we did in Filtering section above. Following that, we perform same transitions as in the previous section. With that, the final result would look like this:

Conclusion

I want to end this article by saying that not all charts and plots are created equal. Some of them - like this kind of a horizontal bar chart - should get more attention and be used more frequently in my opinion. So, hopefully this visualization and brief explanation gave you enough information to maybe use this chart in your next data visualization. If you want too see full source code for this chart you can head over to my repository here and feel free to leave feedback or ask questions in issues, or just give it a star if you like this kind of content. 😉