DEV Community

loading...
Cover image for Expanding Self-Awareness using Amazon Comprehend (Part 1)

Expanding Self-Awareness using Amazon Comprehend (Part 1)

jameson profile image Jameson ・5 min read

As Engineers, many of us struggle with self-awareness. How do others perceive our actions and behaviors? Social interactions are rich with qualitative subtleties. As Engineers though, we respond most viscerally to data.

Today I'm going to study my Twitter account, @softwarejameson. How do my tweets make people feel? How can I improve?

To answer these questions, I'm going to write a Python script. The program will pull all of my tweets & their responses. I'll then run sentiment analysis on each.

We'll explore different aspects of this problem over a series of articles. This first article will mostly cover getting our feet wet with Twitter & Comprehend.

Boot-strappin' an Environment for Self Growth 🤙

I'm going to use Tweepy to interact with Twitter, and the AWS SDK for Python to interact with Amazon Comprehend's sentiment analysis functionality.

This will require developer accounts for both Twitter and AWS. If you already have dev accounts, just proceed to the Twitter Developer Portal & AWS Console to grab your access credentials.

Let's start with some boilerplate code for Twitter & AWS.

Tweepy's docs suggest to install the library using pip, and to run a simple hello world program:

pip install tweepy
Enter fullscreen mode Exit fullscreen mode
import tweepy

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

public_tweets = api.home_timeline()
for tweet in public_tweets:
    print(tweet.text)
Enter fullscreen mode Exit fullscreen mode

Remark: the consumer_key and consumer_secret are called "API key" and "API secret" on the Twitter Developer Portal.

Now let's get our "Hello World" for our sentiment analysis tool. The Amazon Comprehend documentation suggests:

import boto3
import json

comprehend = boto3.client(service_name='comprehend', region_name='region')

text = "It is raining today in Seattle"

print('Calling DetectSentiment')
print(json.dumps(comprehend.detect_sentiment(Text=text, LanguageCode='en'), sort_keys=True, indent=4))
print('End of DetectSentiment\n')
Enter fullscreen mode Exit fullscreen mode

Remark: Be sure to replace region with something like us-east-1. Also ensure you are correctly suppling credentials to Boto3.

Okay! Both are running. Let's start adapting them to our specific use case.

First cut data

Let's try grabbing some tweet text, and running sentiment analysis on it.

# Dump out raw data from Twitter
user = 'softwarejameson'
cursor = tweepy.Cursor(api.user_timeline, id=user)
for tweet in cursor.items():
    print(json.dumps(tweet._json, indent=2))
Enter fullscreen mode Exit fullscreen mode

This prints a ton of data. Let's focus on a few particular fields of interest. To make things more manageable, let's create a data model called TweetSummary:

class TweetSummary:
    def __init__(self, status: tweepy.Status):
        # Cherry-pick a few fields of interest
        self.author_id = status.user.id
        self.author_login = status.user.screen_name
        self.author_name = status.user.name
        self.content = status.text
        self.favorite_count = status.favorite_count
        self.retweet_count = status.retweet_count

    def to_json(self):
        return json.dumps(self, default=lambda o: o.__dict__, indent=2)

# Print a more manageable summary of the tweets
for tweet in cursor.items():
    print(TweetSummary(tweet).to_json())
Enter fullscreen mode Exit fullscreen mode

Cool beans, dudes. Now we'll see much more manageable output. Here's an example of one of the summaries that gets printed:

{
  "author_id": 511012448,
  "author_login": "softwarejameson",
  "author_name": "Jameson",
  "id": 1356759533310402560,
  "content": "@drpoindexter @ASpittel Haha, I love how this is tied back to the real world. So creative!",
  "favorite_count": 1,
  "retweet_count": 0
}
Enter fullscreen mode Exit fullscreen mode

This data corresponds to this Tweet:

Now, let's see what happens when we pass the content text to Comprehend.

text = "@drpoindexter @ASpittel Haha, I love how this is tied back to the real world. So creative!"

comprehend = boto3.client(service_name='comprehend', region_name='us-east-1')
response = comprehend.detect_sentiment(Text=text, LanguageCode='en')
sentiment = response['Sentiment'].lower()
print("This tweet was {}.".format(sentiment))
Enter fullscreen mode Exit fullscreen mode

We can see that:

This tweet was positive.
Enter fullscreen mode Exit fullscreen mode

Building a TweetAnalyzer

In order to run analysis on lots of tweets, and to start sorting and filtering various results, we need some more tools.

Let's build a few more abstractions to help ourselves scale the problem up.

A little class to hold Twitter credentials:

class TwitterCredentials:
    def __init__(self, file_path):
        with open(file_path) as f:
            data = json.load(f)
            self.api_key = data['api_key']
            self.api_secret = data['api_secret']
            self.access_token = data['access_key']
            self.access_secret = data['access_secret']
Enter fullscreen mode Exit fullscreen mode

And a class containing some of the low-level functionalities we've talked about thus far:

class TweetAnalyzer:
    def __init__(self, username: str, creds: TwitterCredentials):
        # Twitter stuff.
        auth = tweepy.OAuthHandler(creds.api_key, creds.api_secret)
        auth.set_access_token(creds.access_token, creds.access_secret)
        self.tweepy_api = tweepy.API(auth)
        self.username = username

        # Sentiment stuff.
        self.comprehend = boto3.client(service_name='comprehend', region_name='us-east-1')

    def _sentiment(self, summary: TweetSummary):
        response = self.comprehend.detect_sentiment(Text=summary.content, LanguageCode='en')
        scores = response['SentimentScore']
        sentiment = Sentiment(
            response['Sentiment'],
            scores['Positive'],
            scores['Neutral'],
            scores['Negative'],
            scores['Mixed']
        )
        return TweetSentiment(summary, sentiment)

    def _tweets(self):
        cursor = tweepy.Cursor(self.tweepy_api.user_timeline, id=self.username)
        return cursor.items()
Enter fullscreen mode Exit fullscreen mode

Most Positive & Negative Tweets

We now have enough to start working on bigger problems. Let's try to analyze all of my Twitter content and find the most positive tweets:

credentials = TwitterCredentials('twitter-credentials.json')
analyzer = TweetAnalyzer('softwarejameson', credentials)
tweets = [TweetSummary(t) for t in analyzer._tweets()]
sentiments = [analyzer._sentiment(t) for t in tweets]
sentiments.sort(reverse=True, key=lambda s: s.sentiment.positive)
for s in sentiments:
    print(s.to_json())
Enter fullscreen mode Exit fullscreen mode

It turns out that the third most positive Tweet in my account is the reply to one of Ali Spittel's tweets, as above!

In our data model:

{
  "summary": {
    "author_id": 511012448,
    "author_login": "softwarejameson",
    "author_name": "Jameson",
    "id": 1356759533310402560,
    "content": "@drpoindexter @ASpittel Haha, I love how this is tied back to the real world. So creative!",
    "favorite_count": 1,
    "retweet_count": 0
  },
  "sentiment": {
    "label": "POSITIVE",
    "positive": 99,
    "neutral": 0,
    "negative": 0,
    "mixed": 0
  }
}
Enter fullscreen mode Exit fullscreen mode

Now, let's see my most negative tweets. Oh boy. Brace yourself.

To see it, all I have to do is update the comparator that I used above. Instead of:

sentiments.sort(reverse=True, key=lambda s: s.sentiment.positive)
Enter fullscreen mode Exit fullscreen mode

It will become:

sentiments.sort(reverse=True, key=lambda s: s.sentiment.negative)
Enter fullscreen mode Exit fullscreen mode

Lemme get a drumroll 🥁...


{
  "summary": {
    "author_id": 511012448,
    "author_login": "softwarejameson",
    "author_name": "Jameson",
    "id": 1364331040395952128,
    "content": "I fucking hate bureaucracy.",
    "favorite_count": 10,
    "retweet_count": 0
  },
  "sentiment": {
    "label": "NEGATIVE",
    "positive": 0,
    "neutral": 0,
    "negative": 99,
    "mixed": 0
  }
}
Enter fullscreen mode Exit fullscreen mode

Now, you probably read this and thought:

"Hey, my guy likes a lightweight process. That's cool, my guy. Some say I'm a little like that, myself, 😉."

But, that's not how Amazon Comprehend interprets it. 🤷

In Part II, let's expand on this base. We'll analyze how others perceive these tweets - using data. Stay tuned!

Discussion (0)

pic
Editor guide