I describe how and why I built a Twitter bot in Python that converts tweets into beautiful expressions. I also define what is a bigram poem and propose why it should be a literary creation. All code is available on my Github page.
The inspiration
During my third year of graduate school I began research on text analysis and how to apply this to natural language processing (NLP) in social media. I presented a research proposal where I wanted to apply some data mining and NLP techniques using real-time information from Twitter. I gave an example during the presentation on how the text from one tweet can be converted to a list of bigram strings (more on this later). Here was the slide:
I read the list aloud and commented on the poetic sound it had. I said that if I don’t become an economist then maybe I will have a future as a slam poet where I read tweets as some kind of bigram poem. My classmates laughed at the passing joke.
But I later got to thinking, “why not?”.
Why
The purpose of the project is the novelty of creating a Twitter bot that rewrites public tweets into bigram poems. It allowed me to improve my coding skills and become familiar with the Twitter API. It might even be educational by teaching others an NLP technique, but probably not. I can however claim that bigram poems did not exist before this humble blog post.
My contribution through this project is to propose the use of bigrams not as a mere means of text analysis, but as a form of poetic expression. I build a function that takes utilitarian sentences and rewrites them into concise poems. This shows how common language has the capacity to sound elegant. Either that or it is merely a cute little project.
What is a bigram poem?
Good question! A bigram poem is any phrase converted into a list of consecutive word pairs.
Context is very important to understand the meaning of words. If an online review describes a restaurant where “the food is not good” we as humans easily understand the negative connotation. If text analysis only considers the frequency of individual words, then a computer would likely interpret the word “good” as being positive sentiment and consider the phrase also as positive. A common remedy to this problem is to break the phrase apart into n-grams, or groups of n-many consecutive words. A bigram is one such example where n=2. A bigram of the previous phrase would thus read:
"the food", "food is", "is not", "not good"
It is easier for a computer to analyze these and appropriately label the word pair "not good" as negative sentiment.
The bigram_poem() function
The first step to building a Twitter bot that sends out bigram poems was to write a function that automatically converts phrases into bigram poems. The below Python code describes the process:
import nltk # natural language tool kit is a must for NLP in Python
def bigram_poem(phrase): # define new function bigram_poem
rejects = '¿.?!,[]|"“”();…{}«•*+@~><' # define punctuation to be removed
phrase_reject = phrase.translate({ord(c): None for c in rejects}) # remove defined punctuation
phrase_split = phrase_reject.split(' ') # split phrase by whitespace
phrase_clear = list(filter(None, phrase_split)) # strip any extra whitespace
phrase_shortened = phrase_clear[0:8] # only use the first 8 terms
phrase_bigram = list(nltk.bigrams(phrase_shortened)) # convert to bigram list
tw = '' # create new string
for b in phrase_bigram: # loop through bigrams
tw += ' ' + b[0] + ' ' + b[1] + ' \n' # add looped bigrams to string w line break
return(tw)
I will elaborate on a few of the above steps for those without programming experience. First, I discard select punctuation that obstructs the flow of the poem. I next truncate the phrase into the first eight words to ensure the desired output is short and sweet. Finally, I split the phrase by word and convert into a list of bigrams.
I test the function using a list of short quotes from the TV show Arrested Development. I then randomly select a quote from the list and covert it into a bigram poem. Three examples are shown below.
import random
AD = ['I hear the jury’s still out on science.', 'I’m a monster!', 'Baby you got a stew going.', 'Do these effectively hide my thunder?', 'Army had a half day.', 'Say goodbye to these!', 'This party’s going to be Off. The. Hook.', 'There are dozens of us. Dozens!', 'And that’s why you always leave a note.', 'I’m afraid I just blue myself.', 'There is always money in the banana stand.', 'I’ve made a huge mistake.', 'Dead dove. Do not eat.', 'Here’s some money. Go see a Star War.', 'It’s hot ham water!', 'But where does the lighter fluid come from?', 'Get rid of the Seaward.', 'You’re just a chicken.', 'It’s an illusion Michael!', 'For British eyes only', 'Family love Michael.', 'Watch out for hop-ons.', 'They don’t allow you to have bees here.', 'Has anyone in this family seen a chicken?', 'Solid as a rock!', 'Did nothing cancel?', 'I know you’re the big marriage expert.', 'She calls it a mayonegg.', 'It’s vodka. It goes bad once it’s opened.', 'Don’t call it that.', 'On the Next Arrested Development...', 'Luz, that coat costs more than your house!', 'I just want my kids back.', 'I have Pop Pop in the attic.', 'The soup of the day is Bread.', 'My heart is straining through my shirt', 'Maybe, I’ll put it in her brownie.', 'I like hot sailors.', 'I understand more than you’ll never know.', 'Who’d like a banger in the mouth?', 'And that’s why you don’t yell.', 'I don’t care for GOB.', 'No touching! No touching!', 'Glasses off, hair up.', 'And I think I maced a crane.', 'You’re my third least favorite child.', 'Something that says leather daddy?', 'I enjoy scholarly pursuits.', 'You’re a crook, Captain Hook...', 'And this is not a Volvo.', 'Rita corny, Michael.', 'We’re having a fire sale.', 'Tea for dong!', 'They said it was a bob.']
print(bigram_poem(random.choice(AD))) # Convert random quote to bigram poem
The Twitter API
I set up a Twitter account with the handle @BigramPoetry, apply for developer status on the platform and am accepted. I build the Twitter bot in accordance with the site’s automation rules that explain:
"Provided you comply with all other rules, you may post automated Tweets for entertainment, informational, or novelty purposes. As a reminder, accounts posting duplicative, spammy, or otherwise prohibited content may be subject to suspension."
To adhere to such rules, I clearly explain the intention of the project in the account bio and tag my personal account for ease of communication. To prevent the bot from being spammy, I restrict only one tweet to be sent out every fifteen minutes.
I am now ready to connect to the Twitter API and do so here using the Twython wrapper:
import json
from twython import Twython
with open("twitter_credentials.json", "r") as file: # load credentials from json file (not included for security reasons)
creds = json.load(file)
twitter = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'],
creds['ACCESS_TOKEN'], creds['ACCESS_SECRET']) # add credentials to access API
Note: I do not share the private access information and instead have this saved on a separate json file. See instructions here on how to securely save API credentials.
Deciding the source text
My first consideration of source material was to use tweets from a curated list of actors, musicians, politicians, entrepreneurs and athletes. This had the advantage of having a large group of followers but the disadvantages of infrequent tweets and the ephemeral nature of celebrities who may quickly lose their relevance. I also considered following trending topics, but this violates the Twitter automation rules.
The solution then was to follow a specific hashtag that will target users who would find this project of interest. My first choice, #NLP, is a poor candidate since it is confused with other acronyms and is not frequently used on Twitter. I instead opt to use tweets that include '#machinelearning' as my corpus.
With source text in mind and API connection established, I can now collect the desired tweets in real time and apply additional formatting, as shown here:
import datetime
from twython import TwythonStreamer
class MyStreamer(TwythonStreamer): # create a class that inherits TwythonStreamer
def on_success(self, data): # receive data when successful
# Conditions: 1) restrict tweets to 3 or more words,
tweet = data['text'] # 2) exclude retweets, 3) exclude links,
bigram_len = bigram_poem(tweet).count('\n') # 4) English only, 5) exclude myself
if 'text' in data and bigram_len > 2 \
and 'RT' not in tweet \
and 'http' not in bigram_poem(tweet) \
and data['lang'] == 'en' \
and data['user']['screen_name'] != 'BigramPoetry':
tweet = tweet.replace('\n',' ').replace('VIDEO','') # remove edge cases with bad formatting
tweet = tweet.replace(' -',' ').replace(' –',' ').replace(' ‘','')
tweet = tweet.replace('- ',' ').replace(' /',' ').replace('— ',' ')
tweet = tweet.replace(': ',' ').replace(':)','').replace('&','and')
poem = 'A Bigram Poem inspired by ' + data['user']['screen_name'] # title line
poem += ':' + '\n' + bigram_poem(tweet) # use bigram_poem function
poem += ' -' + data['user']['name'] # signature line
twitter.update_status(status=poem) # tweet out on @BigramPoetry account
print(poem) # print result, timestamp
print(datetime.datetime.now(), '\n')
self.disconnect() # stop stream (so old tweets don't dam up)
def on_error(self, status_code, data): # when problem with the API
print(status_code, data)
print(datetime.datetime.now())
self.disconnect()
I avoid foreign language tweets as well as retweets to ensure I am working with original content that is comprehensible to the target audience. After a series of testing I individually correct for select edge cases where the poem formatting was obstructed by some kind of unconventional character or spacing. Finally, I format each tweet to have a title and signature line.
Pushing the tweets out
Once incoming tweets have been converted to a bigram poem and formatted as desired, they are ready for the twittersphere. I push the tweets out with this:
import time
import sys
stream = MyStreamer(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'], # credentials for API streaming
creds['ACCESS_TOKEN'], creds['ACCESS_SECRET'])
while True: # continuously runs the function
try: # attempt to execute the below function
status_stream = stream.statuses.filter(track='#machinelearning') # start the stream to search for tweet
time.sleep(900) # waits 15 minutes (900 seconds) before starting over
except: # when above function fails, print error
print("Unexpected error:", sys.exc_info()[0])
time.sleep(900)
Note before how the streaming function disconnects after sending out a suitable tweet to prevent incoming tweets from damming up. Here the while True statement and time delay restart the stream to post a tweet every 15 minutes to ensure that only the most recent tweet meeting the defined criteria will be used. The try and except statements together ensure that the stream will not be permanently interrupted when an error occurs.
While testing my original code the account was flagged by Twitter as being too spammy. The app was muzzled and lost API writing privileges until it was fixed. Here is one tweet in question:
It turned out the issue was in the unsolicited @mention of the user as prohibited in the automation rules. I remove the '@' from the title line and for good measure also include it in the list of rejected characters within the bigram function. This means that any tweet including an @screen_name after going through the function will be reduced to just their screen_name.
Hosting the bot
The final step now is to allow the Twitter bot to continuously run and not depend on my laptop. After a two-day online training, I was able to understand the basics of Amazon Web Services (AWS). I launched an application with Elastic Beanstalk that established a server instance to run the code and allocated cloud storage for keeping record of code deployments (EC2 and S3 respectively, for those in the know).
I am deeply satisfied with the results, shown below in a non-random sample of before and after tweets. Some poems come out a bit odd, but all in all I am impressed with the elegance of composition. Seriously, read them aloud and hear their poetry!
This Twitter bot that automatically writes Bigram Poems may not be what the world needed, but it is certainly what the world deserves.
Gratitude and resources
I would like to first thank my friend Seth Yost for being a sounding board to this project and for tackling some AWS kinks. I also am appreciative of Emily Cain for her blog post as well as Jared for his post that both outlined aspects of the process and offered sample code.
For those looking to explore more with the Twitter API in Python, I recommend this guide which I admittedly wish I had discovered before the project was already finished. Finally, check out Botwiki for a list of articles, sample bots and guides for how to build your own.
Top comments (0)