Part of the project studies the popularity of three of the most used programming languages for Machine Learning (ML) and Big Data (BD) projects, namely Python, R and Scala on Stack Overflow. The aims of this project can be formulated by the following research questions: What is the popularity of each programming language with respect to a certain topic and what is the distribution of popularity across countries?
Analyzing StackOverflow dataset for MBD course
Toolkit: HTML, CSS, jQuery, D3.js, PySpark (Python) for Big Data
The interactive map which we created to show popularity distribution over countries is provided via the link above.
In Machine Learning, Python has the highest popularity compared to R and Scala. The top three countries are India, United States of America and France with total number of questions equal to 196, 102 and 39 respectively.
In Big Data, Sweden comes in the first position amongst countries using Scala, followed by United States of America. While, Python found to be more popular in India than R and Scala.
This was the first project to work with Big Data and we learnt a lot on how to process and finally got the visualization done as part of our subject. Looking forward to using these skills in future projects and keep learning.