My journey through #100DaysOfCode is off to an excellent start! I feel like I have learned more in the past 10 days by working on projects than I would have in 50 days of watching tutorials or reading books. Don't get me wrong, books and tutorials can be an essential part of the learning process. However, I am starting to see that coding is a skill and I think it is better to learn a skill by doing than studying.
I have spent these first 10 days working on building a correlation calculator in Python. The goal is to have the user input values from the command line into two lists, and then find the correlation coefficient between those two lists. There are a few additional steps between the raw scores and the correlation calculation, but I won't get too deep into the details quite yet. (More information at the bottom.) The highlights are that in order to calculate the correlation coefficient between two sets of scores, I needed to calculate the mean, standard deviation, and Z scores of each data set. I wrote functions for each of these calculations, but ran into quite a bit of trouble with a couple of them.
When writing the function for the standard deviation, I was trying to get a little too fancy and do it all in one line. No matter how many times I tried to rearrange variables or add/remove parentheses, it wasn't working. Then I remembered KISS, Keep It Simple Stupid. So I broke down the formula into individual steps, and was able to write a working function that returned the standard deviation of a list of raw scores. I learned that the statistics module of Python has a built in standard deviation function, but this was a learning exercise so I to decided write one on my own.
The next bump in the road was the correlation function itself. I knew the function had to take in two lists and during the calculation had to multiply each index of the list by the same index of the other list. For example, list1 * list2, list1 * list2, and so on. I tried nested for loops, multiplying the two lists together directly, and a few other things, all to no avail. Then the light bulb went off. Why don't I just google it? After searching "how to multiply two lists in Python", I learned about the zip() function. To be honest, I still don't have a great understanding of how the zip() function works, but after seeing an example I gave it a shot. It worked! I have a script that takes in values for two lists from the command line and outputs the correlation between the two lists!
Thank you for reading, and stay tuned for updates as I document my journey through #100DaysOfCode here on dev.to.
For the curious readers out there, here is some further information about the statistics and code:
Population correlation coefficient formula - The correlation coefficient is the sum, over all the people in the study, of the product of each person's two Z scores, then divided by the number of people. Here is an image of the formula.
Standard deviation formula - The standard deviation is the square root of the variance, and the variance is the sum of the squared deviations of the scores from the mean, divided by the number of scores. Here is an image of the formula.
The zip() function - Here is the documentation for the zip() function. Quite honestly, I don't understand it well enough (yet) to feel comfortable explaining it.
The source code - If you want to see the source code for my correlation calculator, here is the link to the repo.
Thanks again for reading!