I was looking for a tutorial to recommend to an acquaintance who is moving into digital journalism, and I came across your post. It is very well-written. Thanks for sharing!
This is just a short remark, since you seem to be using Pandas, but not to its fullest potential.
When you observe a possible relationship between RTs and Likes in subsection 2.1, you can quantify this by computing the (Pearson) correlation
data['RTs'].corr(data['Likes'])
(It is close to 0.7.)
When finding the sources of tweets in subsection 2.3, instead of using loops,
sources = data['Source'].unique()
and then, when computing percentages,
data['Source'].value_counts()
You can put the latter in a data frame... In any case, thanks again!
I must say that it was for an introductory workshop and I finished all the material during dawn three days before or something. :P
It might be possible that most of the last part is not optimized in code. :(
Thanks for your observations! :D
They simplify the data handling using the potential of Pandas. :)
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
I was looking for a tutorial to recommend to an acquaintance who is moving into digital journalism, and I came across your post. It is very well-written. Thanks for sharing!
This is just a short remark, since you seem to be using Pandas, but not to its fullest potential.
When you observe a possible relationship between RTs and Likes in subsection 2.1, you can quantify this by computing the (Pearson) correlation
data['RTs'].corr(data['Likes'])
(It is close to 0.7.)
When finding the sources of tweets in subsection 2.3, instead of using loops,
sources = data['Source'].unique()
and then, when computing percentages,
data['Source'].value_counts()
You can put the latter in a data frame... In any case, thanks again!
I must say that it was for an introductory workshop and I finished all the material during dawn three days before or something. :P
It might be possible that most of the last part is not optimized in code. :(
Thanks for your observations! :D
They simplify the data handling using the potential of Pandas. :)