DEV Community

Cover image for Sentiment analysis on Trump's tweets using Python 🐍
Rodolfo Ferro
Rodolfo Ferro

Posted on • Updated on

Sentiment analysis on Trump's tweets using Python 🐍

Top comments (98)

Collapse
 
ben profile image
Ben Halpern

Fascinating. I wouldn't be surprised if this kind of research goes mainstream in the future in journalism.

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

Exactly, I think about the same. Or for having good impact in your products ( social media) if you're trying to sell stuff. There are many possibilities.

A project in mind is to just work in this kind of analysis for suicide prevention. 👍

Collapse
 
lschultebraucks profile image
Lasse Schultebraucks

Nice, very interesting! He seems to tweet surprisingly a high count of positive tweets (51%). But how much of this tweets are fake news and lies is another question... nytimes.com/interactive/2017/06/23...

Collapse
 
rodolfoferro profile image
Rodolfo Ferro • Edited

Yeah, that resulted surprising to me! I've heard that he's not the only one tweeting from his account, but he has a team for this. That might be a possible reason. That's why it results interesting to analyze the polarity of tweets that come from different sources.

Collapse
 
hajimurtaza profile image
Murtaza Haji

Well technically these sentiment calculations should be taken with a grain of salt. you use VaderSentiment library as well and compare both values of sentiments to get better insight.

Collapse
 
ibmibrahimkasim profile image
Ibrahimkasim

Awesome tutorial!!

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

Thank you so much! 😀👍🏼

Collapse
 
theajsingleton profile image
Alex Singleton

Hi there, I was having some trouble with the "visualizing the statistics" section as detailed in sections 2.1 and 2.2; if you take a look at my GitHub repo, you'll notice I had to comment out # %matplotlib inline and replaced requirement with plt.ion() within the script-running file (trumpet.py) in order to run the scripts without failure (e.g. python3 trumpet.py). Can you please explain how to generate the visualizations as detailed in those sections? For some reason, I'm unable to render those visual within my Jupyter Notebook-env/config. I'm only 10 days new to Python, so I'd appreciate any guidance. Great tutorial-
thanks!

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

Sure! It's quite easy actually. :)

Instead of adding plt.ion() at the beginning, you can add the following code each time you're generating a plot, in order to visualize it: plt.show(). This will open an external window and display the immediately last plot generated.

You can see this in the Official Pyplot tutorial I shared at the end (References).

Please let me know I you have any other problem. :)

Collapse
 
theajsingleton profile image
Alex Singleton

Got it, Rodolfo! Thank you for the guidance- tremendous fun! ;)

Collapse
 
esaidhsaid profile image
Elias Said • Edited

he estado intentando correr el script pero tengo varios problemas que quizás puedas ayudarme.

Soy nuevo en Python pero me encantaría adaptar este ejemplo a otros usuarios si lograse hacer funcionar.

Tengo Python 3.6.3 y trabajo con Spyder, he copiado tu ejemplo pero el script se queda en la lúnea 37:

We create an extractor object:

extractor = twitter_setup()

Cuando aparece este error:


extractor = twitter_setup()

NameError: name 'twitter_setup' is not defined


A que se debe esto?

Gracias por tu orientación!

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

Es debido a que no has definido tu función twitter_setup().

Asegúrate de en Spyder (específicamente en tu código) tener definido lo siguiente:

# We import our access keys:
from credentials import *    # This will allow us to use the keys as variables

# API's setup:
def twitter_setup():
    """
    Utility function to setup the Twitter's API
    with our access keys provided.
    """
    # Authentication and access using keys:
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

    # Return API with authentication:
    api = tweepy.API(auth)
    return api

Otra recomendación es que intentes utilizar los Jupyter notebooks. :)

Collapse
 
esaidhsaid profile image
Elias Said

Muchísimas gracias Rodolfo, de verdad es que impresionas!!!
Ya resolví este tema... a ver si no me da más cosas y puedo ver lo que sale por mi cuenta!
Abrazos

Collapse
 
sebastiannielsen profile image
Sebastian-Nielsen

How did did you write code in your comment with syntax highlighting?

Collapse
 
fabiansalazarb profile image
Fabián S.

Hola Rodolfo!

Te tengo una pregunta, ¿es posible modificar la función de limpieza de Tweets para que no elimine los acentos de las palabras en español?

Thread Thread
 
rodolfoferro profile image
Rodolfo Ferro

Claro, en realidad sólo sería modificar tu regla de limpieza en re.sub(). :)

Collapse
 
deuxperspective profile image
Rich • Edited

Hi Rodolfo, great article!

New to Python, wondering how to retrieve more than the default 15 tweets from this code? I looked up a few solutions elsewhere but couldn't figure out how to integrate. Suggestions?

Thanks again! - Rich

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

We're actually retrieving the first 200 tweets, this is specified in the count parameter:

tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)

The API allows us to retrieve at most 200 tweets per call.

Collapse
 
deuxperspective profile image
Rich • Edited

Just went through the post again and found:

We create a tweet list as follows:

tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)
print("Number of tweets extracted: {}.\n".format(len(tweets)))

We print the most recent 5 tweets:

print("5 recent tweets:\n")
for tweet in tweets[:5]:
print(tweet.text)
print()

Which I've replaced with:
tweets = extractor.search(input("Topic you want to analyze: "))

Perhaps I need to play with this, if I can't figure it out, I'll re-ask. Lol my apologies!

Fixed w/ very simple:
tweets = extractor.search(input("Topic you want to analyze: "), count=200)

Thanks!

Thread Thread
 
deuxperspective profile image
Rich

Although, even with count=200, this retrieves 100 tweets only.

Is there a way to refresh and retrieve more?

Thanks again!

Thread Thread
 
microworlds profile image
Caleb David

@deuxperspective You can use a tool I built, hosted on RapidAPI

rapidapi.com/microworlds/api/twitt...

Collapse
 
r0f1 profile image
Florian Rohrer

Thank you for your tutorial! Its was easy to follow and everything work on my first attempt!

I do not want to reload all the tweets from the web, while I am developing. I altered the first few lines, to cache the tweets locally.

save = "saved.pickle"
if os.path.exists(os.path.join(os.path.dirname(__file__), save)):
    with open(save, 'rb') as f:
        tweets = pickle.load(f)
else:
    extractor = twitter_setup()
    tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200)
    with open(save, 'wb') as f:
        pickle.dump(tweets, f)
Collapse
 
rodolfoferro profile image
Rodolfo Ferro • Edited

Excellent idea!

What I did at the end (in my personal case) was to save the tweet list as a csv file (data.to_csv(...)), taking as an advantage that I already had all the info in a pandas dataframe. :)

Thanks for your great comment!

Collapse
 
fredericpierron profile image
Fred. • Edited

Would it be possible to check / detect how many likes comes from the staff of a VIP ? It is said that many politicals manage likes and retweets by asking their support to like and retweet their messages? (not sure to be clear) Through 200 tweets, this would be possible to look at the twitter accounts that like systematically and quickly (as soon as published, like bots do) then substract (or minimize) them from the final evaluation.

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

This is an interesting question.

If you want to count something like this in real time, you would need to modify the way you're consuming the API (rest) and create a listener (you can still do that with Tweepy). That's what I would do, I'd create a specific listener for Trump's tweets and use threads to count for certain time likes and retweets for a new tweet.

Does this answer help? I can try to be more explicit. :)

Collapse
 
fredericpierron profile image
Fred.

Yes I understand the idea. This would be a very useful tool to track false popular account.

Thread Thread
 
rodolfoferro profile image
Rodolfo Ferro

This might help: github.com/RodolfoFerro/TwitterBot...

You can find more info in the documentation: tweepy.readthedocs.io/en/v3.5.0/st...

Hope this complements my previous answer! 👍🏼

Collapse
 
corbeau_18 profile image
corbeau

I was looking for a tutorial to recommend to an acquaintance who is moving into digital journalism, and I came across your post. It is very well-written. Thanks for sharing!
This is just a short remark, since you seem to be using Pandas, but not to its fullest potential.
When you observe a possible relationship between RTs and Likes in subsection 2.1, you can quantify this by computing the (Pearson) correlation

data['RTs'].corr(data['Likes'])

(It is close to 0.7.)

When finding the sources of tweets in subsection 2.3, instead of using loops,

sources = data['Source'].unique()

and then, when computing percentages,

data['Source'].value_counts()

You can put the latter in a data frame... In any case, thanks again!

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

I must say that it was for an introductory workshop and I finished all the material during dawn three days before or something. :P
It might be possible that most of the last part is not optimized in code. :(

Thanks for your observations! :D
They simplify the data handling using the potential of Pandas. :)

Collapse
 
sebastiannielsen profile image
Sebastian-Nielsen • Edited

for source in data['Source']:
for index in range(len(sources)):
if source == sources[index]:
percent[index] += 1
pass

(
Why did the author write 'pass' on the last line?

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

Sorry, my bad.

When I was writing the code I created an empty conditional, so at the beginning I put the pass reserved word and after that I forgot to take it out.

Collapse
 
mabreyaz profile image
Mab Reyaz • Edited

Hye mate thanks for this tutorial. It seems to be working fine with any hash tag, Except #LetsTaxThis . Do you mind to have a look and update , will be very helpful.

Basically I want to extract data from Twitter using #LetsTaxThis Hashtag.

Thanks in advanced :) :)

Collapse
 
tryjude profile image
Jude

Nicely done. I had installed Anaconda before but didn't really get past Hello World in the Jupyter notebook. This was an excellent idea to get people like me off their proverbial rear-end and use it for a very fun idea! I was able to follow it right through and get everything to work after dusting off the cobwebs of my Anaconda environment.

Thanks for sharing!

Collapse
 
rodolfoferro profile image
Rodolfo Ferro

Thank you so much! I really appreciate it.

I'll try to keep posting stuff like this, I enjoy doing applied things with Python. :)

Collapse
 
rah_kaushik profile image
Rahul Kaushik • Edited

Hi Rodolfo, Thanks a lot for a very comprehensive tutorial. However, I still could not get rid of the credentials import problem

ModuleNotFoundError: No module named 'credentials'

I saw in the discussion that you have mentioned a solution but I am very new to Pytho. So I still could not figuer out the solution. Can you please discribe how the file credentials.py should look like (offcourse leaving the blank space where I can put my own credentials)? Thanks a lot.

Collapse
 
rah_kaushik profile image
Rahul Kaushik

Hi Rodolfo, I figuered out the solution and your code worked like a charm. Its awesome.

Collapse
 
aidilarazak profile image
Aidila Razak

Hi! Thanks for the tutorial.

I noticed that tweets containing RTs are not printed in full. How do I get the full RT text?

I am able to un-trunctate a tweet using this:

if tweet['truncated']:
tweet_text = tweet['extended_tweet']['full_text']
else:
tweet_text = tweet['text']

but it won't work for tweets containing RTs.

Anyone know how I can get the full RTs?

I would need it to get an accurate sentiment analysis.

Many thanks for the help!

Collapse
 
rodolfoferro profile image
Rodolfo Ferro • Edited

Hi!

One possible approach would be adding the tweet_mode parameter as follows:

tweets = extractor.user_timeline(screen_name="realDonaldTrump", count=200, tweet_mode="extended")

Let me know if that does the trick. :)

Collapse
 
xafar0338 profile image
Sardar Zafar Iqbal

i am running this script in pycharm ,everything is working fine but it is not showing the sentimental anylsis part,no error no output,below is the last part of output not showing the sentimental anaylysis,any body can help me.

Number of retweets: 63927
139 characters.

Creation of content sources:

  • Twitter for iPhone
  • Media Studio
  • Twitter for iPad

Process finished with exit code 0

Some comments may only be visible to logged-in visitors. Sign in to view all comments.