Kevin Lien

Posted on

D3 Histograms and Fixing the Bin Problem

d3.js is an extremely powerful charting library and exceptionally useful when it comes to representing data. But along with great power comes great responsibility...actually not responsibility but more like great problems. You can find samples of all kinds of d3 charts but it has been my experience that the examples that have been posted in most galleries use very specific sets of data which make the chart look great, but in the real world data isn't always nicely formatted. One such problem that I've seen come up again and again is using histogram charts.

Histograms are a great way to summarize distribution data in a really simple chart. d3 has built in functionality that works pretty well for histograms, but a lot of time pretty well doesn't cut it. When you look at a sample d3 histogram generally the data set is nicely configured so everything fits neatly in exact bins and just like magic the histogram is drawn. But what happens when you have data that you want charted in 10 bins but your data ranges from zero to some random number like 10.47? d3 tries to force the chart to conform to the data and it does an OK job but sometimes it just looks flat out wrong.

Take this example. There are 4 students who are being dropped into various bins based on number of minutes they have studied. The first bin represents 3 students who have studied zero minutes and the last bin represents 1 student who has studied 24.6 minutes.

That last sliver of a line is technically correct. The bin the student was placed in falls in the 24 - 25 bin but the chart doesn't show a full bar width like expected. It only represents a width of .4 of a bin, but every other bar on the chart represents a full value of 1 bin. Definitely not the ideal outcome. When you use d3's automatic bin() feature, often this is the result. Here is d3 code that can be used to automatically bin data for charting:

``````// The Number of Bins that should be registered
const numberOfBins = 25;

// Use d3 to generate the bin array of all values automatically
const histogram = d3
.bin()
.domain(x.domain())
.value(d => d.value)
.thresholds(numberOfBins);

// Save the Array of Bins to a constant
const bins = histogram(values);
``````

Everything is technically working and it's charting, but that last bin is a problem. That problem appears in questions over and over on StackOverflow. Somehow that last bin needs to be tweaked to be the correct width. My thinking was to go ahead and get the width of the first bin in the array of bin values (the x0 and x1 drawing coordinates) and simply just extend the value of the x1 coordinate of the last bin to be the correct width. Seems logical as the axis are automatically generated so it should render an axis of the correct length accordingly. A simple fix the the array and the width is correct:

``````// Save the Array of Bins to a constant
const bins = histogram(values);

//Last Bin value fixed
bins[bins.length - 1].x1 = bins[bins.length - 1].x0 + bins[0].x1;
``````

The bin width issue is fixed, but now there's a new problem! The xAxis range and domain have to already be declared so the d3.bin() knows how much space the cart will take up, then calculate the bin the values accordingly. Adding the extra width to the last bin pushes the bars off the chart. To fix that, the xAxis would need to be updated, but then that would affect the bin sizes and you're back to square one. Frustratingly the d3 bin() function only works when the data sets are nicely formatted and from my experience that's usually unrealistic.

When doing a deep dive into what the d3 bin() function does, I realized that instead of letting d3 create the bin sizes, you can force it to use your own bin widths by passing it custom array of values as the thresholds item instead of a single number value.

The custom array of values is created by mapping the full length of the xAxis (xAxis.scale().domain()[1]) divided by the number of bins (numberOfBin) to get the individual bin width then multiplying it by the current index (* i). That array gets passed to the thresholds() function.

``````// Set the number of bins
const numberOfBins = 25;

const thresholdArr = [...Array(numberOfBins)].map(
(item, i) => (xAxis.scale().domain()[1] / numberOfBins) * i
);

// Generate the final bins array
const histogram = d3
.bin()
.domain(x.domain())
.value(d => d.value)
.thresholds(thresholdArr);

// Save the bins to a constant
const bins = histogram(values);

``````

That's the expected look! Generating the threshold outside of d3 and then feeding it the array values manually does the trick. Until d3 updates it's bin functionality, this is a simple way to get around that last bin issue. Hopefully it will help other people who will inevitably run into the same issue that I had.