loading...

Better Data Visualization Using Beeswarm Chart

martinheinz profile image Martin Heinz Originally published at martinheinz.dev ・11 min read

Single dataset can be used to convey a lot of different information to the viewer. It all depends on how you visualize the data. In other words - it depends on which kind of chart or plot you choose. Most of the time people just grab bar chart of pie chart. There are however more interesting charts or plots you can use to communicate information from your data to your audience - one of them being Beeswarm Chart.

Note: All the source code (including documentation) from this article can found at https://github.com/MartinHeinz/charts and live demo is available at https://martinheinz.github.io/charts/beeswarm/

Beeswarm Chart

Bee-what?

First time hearing about beeswarm chart? Alright, let's first talk about what it actually is:

Beeswarm chart is one-dimensional chart (or plot) - or in other words - a chart that shows all the information on single axis (usually X axis). It display values as a collection of points similar to scatter plot.

This kind of chart is very useful when you want to display a lot of data points at once - e.g. one node for each country - which would be a problem with bar chart or pie chart. Just imagine pie chart with 150 wedges - no thanks.

Additionally, it makes it easy to spot outliers as they will not be part of the swarm.

Another feature of this chart is that you can nicely visualize different scales (linear and logarithmic) and transition between them as well as color the points to add additional dimension (e.g. continent of country).

Enough talking though, let's see an example:

Beeswarm Demo

What is this dataset we going to be using here, actually? Well, it's WHO Suicide Statistics Data which can be found on kaggle.com. Odd choice maybe, eh? Well, it’s real data that fits this type of chart quite well. So, let’s see how well we can use it!

What We Will Need

Before diving into the code, let's look at the libraries that we will use:

For all the plotting and visualization we will use D3.js and plain old Javascript. In case you are not familiar with D3.js - it stand for Data Driven Documents and is Javascript library for manipulating data. Main advantage of D3.js is its flexibility - all it gives you are functions to manipulate data efficiently.

In this article we will use D3.js version 5 and all you need to start using is to include <script src="https://d3js.org/d3.v5.min.js"> in your HTML (Complete code listing here).

Apart from D3.js we will also use Material Design Lite (MDL) to bring nicer user experience. This is very much optional, but everybody likes some fancy material design buttons and dropdowns, right?

Similarly to D3.js, we just need to include one script tag to start using it - <script defer src="https://code.getmdl.io/1.3.0/material.min.js"></script> (Complete code listing here).

The Code

Setting The Stage

Before we start manipulating any data, we first need to do some initial setup:


let height = 400;
let width = 1000;
let margin = ({top: 0, right: 40, bottom: 34, left: 40});

// Data structure describing chart scales
let Scales = {
    lin: "scaleLinear",
    log: "scaleLog"
};

// Data structure describing measure of displayed data
let Count = {
    total: "total",
    perCap: "perCapita"
};

// Data structure describing legend fields value
let Legend = {
    total: "Total Deaths",
    perCap: "Per Capita Deaths"
};

let chartState = {};

chartState.measure = Count.total;
chartState.scale = Scales.lin;
chartState.legend = Legend.total;

First we define some global variables for width, height and margin as well as 3 data structures for scale, measure of data and plot legend, which we will use throughout rest of the code. We also use those to define initial state of chart, which is stored in chartState variable.

Next thing we define, are colors for all the nodes (circles) of the plot:

// Colors used for circles depending on continent/geography
let colors = d3.scaleOrdinal()
    .domain(["asia", "africa", "northAmerica", "europe", "southAmerica", "oceania"])
    .range(['#D81B60','#1976D2','#388E3C','#FBC02D','#E64A19','#455A64']);

d3.select("#asiaColor").style("color", colors("asia"));
d3.select("#africaColor").style("color", colors("africa"));
d3.select("#northAmericaColor").style("color", colors("northAmerica"));
d3.select("#southAmericaColor").style("color", colors("southAmerica"));
d3.select("#europeColor").style("color", colors("europe"));
d3.select("#oceaniaColor").style("color", colors("oceania"));

To create coloring scheme we use d3.scaleOrdinal which creates mapping from a domain (continent names) to range (color codes). Then we apply these colors to CSS IDs, which are given to checkboxes in the HTML GUI.

Now we are getting into code for the actual chart. Following lines will prepare the SVG which will be our drawing area:

let svg = d3.select("#svganchor")
    .append("svg")
    .attr("width", width)
    .attr("height", height);

let xScale = d3.scaleLinear()
    .range([margin.left, width - margin.right]);

svg.append("g")
    .attr("class", "x axis")
    .attr("transform", "translate(0," + (height - margin.bottom) + ")");

// Create line that connects node and point on X axis
let xLine = svg.append("line")
    .attr("stroke", "rgb(96,125,139)")
    .attr("stroke-dasharray", "1,2");

First call above that creates the svg variable finds the <div> with svganchor ID and appends SVG element to it with width and height which we defined earlier. Next, we create function called xScale - this function is very similar d3.scaleOrdinal used earlier. It also creates mapping between domain and range but with continuous domain rather then discrete one. You probably already noticed, but we didn't specify domain here - that's because we don't know the extent of our dataset yet so we left it to its default ([0, 1]) for the time being.

After that, we append <g> element container to the existing SVG element. This element will be used as container for the X axis and its ticks - those will be appended later when we actually render the line. We can however set its CSS styles and move it to the bottom of the SVG now, so that we don't have to deal with it later.

Final part of this snippet creates line that connects node and point on the X axis while hovering over said circle. You can see that on the image below:

Alt Text

Last thing that we want to do before we jump into manipulating the dataset is to create simple noes tooltip:

// Create tooltip div and make it invisible
let tooltip = d3.select("#svganchor").append("div")
    .attr("class", "tooltip")
    .style("opacity", 0);

For the time being the tooltip is just a <div> that we put into anchor of our chart. We also make it invisible for now as we will dynamically set its content and opacity when we deal with mouse move events (hovering).

Loading The Data

Now is finally time to load the data. We do that using d3.csv function. This function uses fetch API to get CSV file from URL and returns Promise, which requires following structure of code:

d3.csv("https://martinheinz.github.io/charts/data/who_suicide_stats.csv").then(function(data) {
      // Here we can process data
  })
}).catch(function (error) {
    // Handle error...
    if (error) throw error;
});

All our remaining code belongs in the body of above anonymous function, as that is where the loaded data is available to us.

Here are also examples of the data before and after it is loaded to better visualize it's structure:

Before:

country total population perCapita continent
Argentina 2987 38859125 0.13 southAmerica
Armenia 67 2810664 0.42 europe
Aruba 2 97110 0.486 northAmerica
Australia 2608 21600180 0.083 oceania
Austria 1291 8079615 0.063 europe

After:

0: {country: "Argentina", total: "2987", population: "38859125", perCapita: "0.13", continent: "southAmerica"}
1: {country: "Armenia", total: "67", population: "2810664", perCapita: "0.42", continent: "europe"}
2: {country: "Aruba", total: "2", population: "97110", perCapita: "0.486", continent: "northAmerica"}
3: {country: "Australia", total: "2608", population: "21600180", perCapita: "0.083", continent: "oceania"}
4: {country: "Austria", total: "1291", population: "8079615", perCapita: "0.063", continent: "europe"}

Listeners

Before processing the data any further, let's first setup listeners that will react to button clicks in the GUI. We want to make it possible for the user to be able to switch between visualization with "total" or "per capita" measurement as well as with linear or logarithmic scale.

// Listen to click on "total" and "per capita" buttons and trigger redraw when they are clicked
d3.selectAll(".measure").on("click", function() {
    let thisClicked = this.value;
    chartState.measure = thisClicked;
    if (thisClicked === Count.total) {
        chartState.legend = Legend.total;
    }
    if (thisClicked === Count.perCap) {
        chartState.legend = Legend.perCap;
    }
    redraw();
});

// Listen to click on "scale" buttons and trigger redraw when they are clicked
d3.selectAll(".scale").on("click", function() {
    chartState.scale = this.value;
    redraw();
});

Our HTML GUI (source can be found here: https://github.com/MartinHeinz/charts/blob/master/beeswarm/index.html) contains 2 sets of buttons. First of those sets - responsible for switching between "total" and "per capita" visualization has CSS class .measure attached. We use this class to query this groups of buttons, as you can see above. When the click on one of these 2 buttons occurs, we take value of clicked button and change chart state accordingly as well as legend text, which shows the type of measure used.

The second set (pair) of buttons which switches between linear and logarithmic scale, also has CSS class attached (called .scale) and similar to previous one - updates the state of chart based on which button is clicked.

Both of these listeners also trigger redrawing of the whole chart to reflect the configuration change. This is performed using the redraw function, which we will go over in next section.

Apart of those 4 buttons, we also have a few checkboxes in the GUI. Clicking on those filters which continents' countries are displayed.

// Trigger filter function whenever checkbox is ticked/unticked
d3.selectAll("input").on("change", filter);

Handling these checkbox clicks is responsibility of listener above. All it does, is trigger filter function, which adds/removes nodes from selection based on which checkboxes are checked and which are not.

Last event listener we have here is a big one. It takes care of showing and hiding the tooltips when hovering over country circles:

// Show tooltip when hovering over circle (data for respective country)
d3.selectAll(".countries").on("mousemove", function(d) {
    tooltip.html(`Country: <strong>${d.country}</strong><br>
                  ${chartState.legend.slice(0, chartState.legend.indexOf(","))}: 
                  <strong>${d3.format(",")(d[chartState.measure])}</strong>
                  ${chartState.legend.slice(chartState.legend.lastIndexOf(" "))}`)
        .style('top', d3.event.pageY - 12 + 'px')
        .style('left', d3.event.pageX + 25 + 'px')
        .style("opacity", 0.9);

    xLine.attr("x1", d3.select(this).attr("cx"))
        .attr("y1", d3.select(this).attr("cy"))
        .attr("y2", (height - margin.bottom))
        .attr("x2",  d3.select(this).attr("cx"))
        .attr("opacity", 1);

}).on("mouseout", function(_) {
    tooltip.style("opacity", 0);
    xLine.attr("opacity", 0);
});

The code above might look complicated, but it's actually pretty straightforward. We first select all the nodes using .countries CSS class. We then bind the mousemove event to all of these nodes. During the event we set HTML of tooltip to show information about this node (country name, death count). Also, we change its opacity so that it's visible while user points at the circle and we set its position to be on the right of mouse cursor.

The rest of the body of this function renders dashed line connecting the circle and X axis to highlight where the value belongs on the scale.

We also need to handle events for when we move mouse out of the circles, otherwise the tooltip and line would be always visible, which is what the mouseout event handler takes care of - it sets opacity of these elements to 0, to make them invisible.

These event listeners are nice and all, but we need to actually process and draw the data to make any use of them. So, let's do just that!

Drawing It All

Majority of the data processing is done in one function called redraw, which we invoke when the page is loaded for the first time and during various events, which we saw in previous section.

This function uses chartState to decide how it should draw the chart. At the beginning it sets type of scale to linear or logarithmic based on chartState.scale and decides the extent of the chart domain by finding min/max value in dataset's total or perCapita column based on value of chartState.measure:

function redraw() {

    // Set scale type based on button clicked
    if (chartState.scale === Scales.lin) {
        xScale = d3.scaleLinear().range([ margin.left, width - margin.right ])
    }

    if (chartState.scale === Scales.log) {
        xScale = d3.scaleLog().range([ margin.left, width - margin.right ]);
    }

    xScale.domain(d3.extent(dataSet, function(d) {
        return +d[chartState.measure];
    }));

    ...  // Next snippet...
}

Another thing we need to create based on chartState is X axis. Considering the orientation of the chart, we will use bottom axis (axisBottom) and give it 10 ticks. If we are visualizing total numbers we will go with format that uses decimal notation with an SI prefix (s) with 1 significant digit (.1). Otherwise it will be fixed point notation (f), also with one significant digit.

let xAxis;
// Set X axis based on new scale. If chart is set to "per capita" use numbers with one decimal point
if (chartState.measure === Count.perCap) {
    xAxis = d3.axisBottom(xScale)
        .ticks(10, ".1f")
        .tickSizeOuter(0);
}
else {
    xAxis = d3.axisBottom(xScale)
        .ticks(10, ".1s")
        .tickSizeOuter(0);
}

d3.transition(svg).select(".x.axis")
            .transition()
            .duration(1000)
            .call(xAxis);

When the axis and scale are prepared, we execute transition that takes 1 second. During this 1 second the bottom axis is generated by .call(xAxis) by executing the axisBottom generator.

What follows, is the simulation for moving the nodes along the X and Y axis to their desired position:

let simulation = d3.forceSimulation(dataSet)
    .force("x", d3.forceX(function(d) {
        return xScale(+d[chartState.measure]);
    }).strength(2))
    .force("y", d3.forceY((height / 2) - margin.bottom / 2))
    .force("collide", d3.forceCollide(9))
    .stop();

// Manually run simulation
for (let i = 0; i < dataSet.length; ++i) {
    simulation.tick(10);
}

This is one of the more complicated snippets in this article, so let's go over it line by line. On first line we create simulation with specified dataset. To this simulation we apply positioning force to push nodes towards desired position along X axis. This desired position is returned by the xScale function which calculates it by mapping "total" or "perCapita" column to physical size (range) of chart. After that we increase velocity of the simulation using strength function.

The same way we applied force along X axis, we also need to apply force along Y axis, this time pushing nodes towards middle line of chart. Last force we apply is collision force, which keeps the nodes from colliding - more specifically - it keeps their centers 9 pixels apart. Finally, we call stop function to stop the simulation from running automatically and instead execute it in for loop on the lines below it.

We created and ran the simulation, but against what? Well, the nodes (circles) created by following code:

let countriesCircles = svg.selectAll(".countries")
    .data(dataSet, function(d) { return d.country });

countriesCircles.exit()
    .transition()
    .duration(1000)
    .attr("cx", 0)
    .attr("cy", (height / 2) - margin.bottom / 2)
    .remove();

countriesCircles.enter()
    .append("circle")
    .attr("class", "countries")
    .attr("cx", 0)
    .attr("cy", (height / 2) - margin.bottom / 2)
    .attr("r", 6)
    .attr("fill", function(d){ return colors(d.continent)})
    .merge(countriesCircles)
    .transition()
    .duration(2000)
    .attr("cx", function(d) { return d.x; })
    .attr("cy", function(d) { return d.y; });

Here, we begin by querying all the nodes and joining country names from the dataset to them. Next 2 calls to the exit and enter selections respectively deal with situation when nodes are removed and added to selection (e.g. when checkboxes are ticked/unticked or when page is loaded). First, for the exit selection, we create transition that takes 1 second and set center point on X axis to zero and center point on Y axis to middle of the chart. This way, when these nodes are added back into chart, they will come out from single point, like you can see when clicking checkboxes in demo. After transition finishes, then nodes are removed.

The remainder of the snippet - the enter selection - is what actually sets all the attributes of the nodes. We set its CSS class, it's X and Y axis center points, its radius and fill it with color based on the continent it belongs to. Then we merge this selection into rest of the nodes (circles) and create transition that moves them to desired X and Y coordinate over next 2 seconds.

Conclusion

In this article we went deep into implementing a beeswarm chart with D3.js. The takeaway from this article though shouldn't be this specific implementation, but the fact that you might want to consider non-traditional types of charts and plots next time you are visualizing your data, as it might help you better communicate desired information to your audience.

If you want to check out complete code listing from this article, please visit my repository here: https://github.com/MartinHeinz/charts. In this repo you can also find used datasets and sources, as well as other charts and plots implemented with D3.js, like this parallel coordinate chart (next article πŸ˜‰):

Parallel Coordinate Chart

Posted on by:

martinheinz profile

Martin Heinz

@martinheinz

My name is Martin Heinz and I'm a software developer/DevOps engineer. I'm from Slovakia, living in Bratislava.

Discussion

markdown guide