There's been a lot of discussion over the years about R and Python in statistics/data science. Which language should you pick if you're starting out in data science? Which language is better at statistics? Which language is better for machine learning? Which language is better at ______?
Of course, some really awesome people have gone out of their way to try and help you answer these questions. For example, you can check out this infographic shared by Karlijn Willems on DataCamp. The analysis goes through many different categories and presents hard data for its readers. This post will do none of that.
Today we are going to ask the languages which of them is better. This should provide an easy answer for us to go by if they both agree on the winner.
> 'R' > 'python'  TRUE
>>> 'python' > 'R' True
Well... looks like we're still in the same spot. There isn't a clear winner that's going to solve data science for you or your organization. Both languages are great in different situations, and both languages have amazing open source communities behind them.
As the header says, this will just be a little bit. For R, you can read up on the comparison documentation; the details section discusses string comparisons. For Python, you can check out the string comparison section in this post from The Python Guru.
In this post, we're going to ignore the details and the whys of the comparison. We're just going to sort all the letters & numbers. This shows us how things could pan out differently if we wrote the comparison using different cases or leetspeak replacements.
> # create and sort a vector of letters & digits > sorted_chars = sort(c(LETTERS, letters, 0:9)) > collapsed_sorted_chars = paste(sorted_chars, collapse='') > > print(collapsed_sorted_chars)  "0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ"
>>> import string >>> >>> # create and sort a list of letters & digits >>> sorted_chars = sorted(string.ascii_letters + string.digits) >>> collapsed_sorted_chars = ''.join(sorted_chars) >>> >>> print(collapsed_sorted_chars) '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'