Correlation without tears

#python #correlation #statistics

Assessing the degree of correlation between two numeric series is a notoriously challenging task in every scientific discipline, as well as a crucial aspect of every scientific research (ever wondered if there's a correlation between the usage of Internet Explorer and the murder rate?).

Different coefficients have been proposed over the years for different series with different properties (most notably, Pearson's, Spearman's and Kendall's coefficients), and the quest for a "universal" correlation coefficient has often been unproductive.

It took me a while to digest the math behind the new paper from Sourav Chatterjee (https://arxiv.org/abs/1909.10140), but once I modelled the proposed coefficient into Python code I was surprised by how well it performed on arbitrary numeric series (not necessarily monotonic) compared to most of the coefficients out there. And it's also very easy to calculate compared to other coefficients.

So I've put together a Gist notebook that shows how the new coefficient works on some simple data with increasing levels of noise.

Feel free to reuse the code if you need it!