DEV Community

Cover image for Building a Twitter bot in Python to write bigram poems
Thomas J. Weinandy
Thomas J. Weinandy

Posted on • Updated on

Building a Twitter bot in Python to write bigram poems

I describe how and why I built a Twitter bot in Python that converts tweets into beautiful expressions. I also define what is a bigram poem and propose why it should be a literary creation. All code is available on my Github page.
Liquid error: internal

The inspiration

During my third year of graduate school I began research on text analysis and how to apply this to natural language processing (NLP) in social media. I presented a research proposal where I wanted to apply some data mining and NLP techniques using real-time information from Twitter. I gave an example during the presentation on how the text from one tweet can be converted to a list of bigram strings (more on this later). Here was the slide:

I read the list aloud and commented on the poetic sound it had. I said that if I don’t become an economist then maybe I will have a future as a slam poet where I read tweets as some kind of bigram poem. My classmates laughed at the passing joke.

But I later got to thinking, “why not?”.

Why

The purpose of the project is the novelty of creating a Twitter bot that rewrites public tweets into bigram poems. It allowed me to improve my coding skills and become familiar with the Twitter API. It might even be educational by teaching others an NLP technique, but probably not. I can however claim that bigram poems did not exist before this humble blog post.

My contribution through this project is to propose the use of bigrams not as a mere means of text analysis, but as a form of poetic expression. I build a function that takes utilitarian sentences and rewrites them into concise poems. This shows how common language has the capacity to sound elegant. Either that or it is merely a cute little project.

What is a bigram poem?

Good question! A bigram poem is any phrase converted into a list of consecutive word pairs.

Context is very important to understand the meaning of words. If an online review describes a restaurant where “the food is not good” we as humans easily understand the negative connotation. If text analysis only considers the frequency of individual words, then a computer would likely interpret the word “good” as being positive sentiment and consider the phrase also as positive. A common remedy to this problem is to break the phrase apart into n-grams, or groups of n-many consecutive words. A bigram is one such example where n=2. A bigram of the previous phrase would thus read:

"the food", "food is", "is not", "not good"

It is easier for a computer to analyze these and appropriately label the word pair "not good" as negative sentiment.

The bigram_poem() function

The first step to building a Twitter bot that sends out bigram poems was to write a function that automatically converts phrases into bigram poems. The below Python code describes the process:

import nltk                                                           # natural language tool kit is a must for NLP in Python

def bigram_poem(phrase):                                              # define new function bigram_poem   
    rejects = '¿.?!,[]|"“”();…{}«•*+@~><'                             # define punctuation to be removed
    phrase_reject = phrase.translate({ord(c): None for c in rejects}) # remove defined punctuation
    phrase_split = phrase_reject.split(' ')                           # split phrase by whitespace
    phrase_clear = list(filter(None, phrase_split))                   # strip any extra whitespace
    phrase_shortened = phrase_clear[0:8]                              # only use the first 8 terms
    phrase_bigram = list(nltk.bigrams(phrase_shortened))              # convert to bigram list
    tw = ''                                                           # create new string
    for b in phrase_bigram:                                           # loop through bigrams
        tw += ' ' + b[0] + ' ' + b[1] + ' \n'                         # add looped bigrams to string w line break
    return(tw)
Enter fullscreen mode Exit fullscreen mode

I will elaborate on a few of the above steps for those without programming experience. First, I discard select punctuation that obstructs the flow of the poem. I next truncate the phrase into the first eight words to ensure the desired output is short and sweet. Finally, I split the phrase by word and convert into a list of bigrams.

I test the function using a list of short quotes from the TV show Arrested Development. I then randomly select a quote from the list and covert it into a bigram poem. Three examples are shown below.

import random

AD = ['I hear the jury’s still out on science.', 'I’m a monster!', 'Baby you got a stew going.', 'Do these effectively hide my thunder?', 'Army had a half day.', 'Say goodbye to these!', 'This party’s going to be Off. The. Hook.', 'There are dozens of us. Dozens!', 'And that’s why you always leave a note.', 'I’m afraid I just blue myself.', 'There is always money in the banana stand.', 'I’ve made a huge mistake.', 'Dead dove. Do not eat.', 'Here’s some money. Go see a Star War.', 'It’s hot ham water!', 'But where does the lighter fluid come from?', 'Get rid of the Seaward.', 'You’re just a chicken.', 'It’s an illusion Michael!', 'For British eyes only', 'Family love Michael.', 'Watch out for hop-ons.', 'They don’t allow you to have bees here.', 'Has anyone in this family seen a chicken?', 'Solid as a rock!', 'Did nothing cancel?', 'I know you’re the big marriage expert.', 'She calls it a mayonegg.', 'It’s vodka. It goes bad once it’s opened.', 'Don’t call it that.', 'On the Next Arrested Development...', 'Luz, that coat costs more than your house!', 'I just want my kids back.', 'I have Pop Pop in the attic.', 'The soup of the day is Bread.', 'My heart is straining through my shirt', 'Maybe, I’ll put it in her brownie.', 'I like hot sailors.', 'I understand more than you’ll never know.', 'Who’d like a banger in the mouth?', 'And that’s why you don’t yell.', 'I don’t care for GOB.', 'No touching! No touching!', 'Glasses off, hair up.', 'And I think I maced a crane.', 'You’re my third least favorite child.', 'Something that says leather daddy?', 'I enjoy scholarly pursuits.', 'You’re a crook, Captain Hook...', 'And this is not a Volvo.', 'Rita corny, Michael.', 'We’re having a fire sale.', 'Tea for dong!', 'They said it was a bob.']

print(bigram_poem(random.choice(AD)))               # Convert random quote to bigram poem
Enter fullscreen mode Exit fullscreen mode

The Twitter API

I set up a Twitter account with the handle @BigramPoetry, apply for developer status on the platform and am accepted. I build the Twitter bot in accordance with the site’s automation rules that explain:

"Provided you comply with all other rules, you may post automated Tweets for entertainment, informational, or novelty purposes. As a reminder, accounts posting duplicative, spammy, or otherwise prohibited content may be subject to suspension."

To adhere to such rules, I clearly explain the intention of the project in the account bio and tag my personal account for ease of communication. To prevent the bot from being spammy, I restrict only one tweet to be sent out every fifteen minutes.

I am now ready to connect to the Twitter API and do so here using the Twython wrapper:

import json
from twython import Twython  

with open("twitter_credentials.json", "r") as file:                # load credentials from json file (not included for security reasons)
    creds = json.load(file)

twitter = Twython(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'],  
                    creds['ACCESS_TOKEN'], creds['ACCESS_SECRET']) # add credentials to access API
Enter fullscreen mode Exit fullscreen mode

Note: I do not share the private access information and instead have this saved on a separate json file. See instructions here on how to securely save API credentials.

Deciding the source text

My first consideration of source material was to use tweets from a curated list of actors, musicians, politicians, entrepreneurs and athletes. This had the advantage of having a large group of followers but the disadvantages of infrequent tweets and the ephemeral nature of celebrities who may quickly lose their relevance. I also considered following trending topics, but this violates the Twitter automation rules.

The solution then was to follow a specific hashtag that will target users who would find this project of interest. My first choice, #NLP, is a poor candidate since it is confused with other acronyms and is not frequently used on Twitter. I instead opt to use tweets that include '#machinelearning' as my corpus.

With source text in mind and API connection established, I can now collect the desired tweets in real time and apply additional formatting, as shown here:

import datetime
from twython import TwythonStreamer  

class MyStreamer(TwythonStreamer):                              # create a class that inherits TwythonStreamer
    def on_success(self, data):                                 # receive data when successful
                                                                # Conditions: 1) restrict tweets to 3 or more words, 
        tweet = data['text']                                    #    2) exclude retweets, 3) exclude links, 
        bigram_len = bigram_poem(tweet).count('\n')             #    4) English only, 5) exclude myself
        if 'text' in data and bigram_len > 2 \
        and 'RT' not in tweet \
        and 'http' not in bigram_poem(tweet) \
        and data['lang'] == 'en' \
        and data['user']['screen_name'] != 'BigramPoetry':
            tweet = tweet.replace('\n',' ').replace('VIDEO','') # remove edge cases with bad formatting
            tweet = tweet.replace(' -',' ').replace(' –',' ').replace(' ‘','')
            tweet = tweet.replace('- ',' ').replace(' /',' ').replace('— ',' ')
            tweet = tweet.replace(': ',' ').replace(':)','').replace('&amp','and')
            poem = 'A Bigram Poem inspired by ' + data['user']['screen_name']  # title line
            poem += ':' + '\n' + bigram_poem(tweet)                       # use bigram_poem function
            poem += '   -' + data['user']['name']                         # signature line
            twitter.update_status(status=poem)                            # tweet out on @BigramPoetry account
            print(poem)                                                   # print result, timestamp
            print(datetime.datetime.now(), '\n')
            self.disconnect()                                   # stop stream (so old tweets don't dam up)

    def on_error(self, status_code, data):                      # when problem with the API
        print(status_code, data)
        print(datetime.datetime.now())
        self.disconnect()
Enter fullscreen mode Exit fullscreen mode

I avoid foreign language tweets as well as retweets to ensure I am working with original content that is comprehensible to the target audience. After a series of testing I individually correct for select edge cases where the poem formatting was obstructed by some kind of unconventional character or spacing. Finally, I format each tweet to have a title and signature line.

Pushing the tweets out

Once incoming tweets have been converted to a bigram poem and formatted as desired, they are ready for the twittersphere. I push the tweets out with this:

import time
import sys

stream = MyStreamer(creds['CONSUMER_KEY'], creds['CONSUMER_SECRET'],      # credentials for API streaming
                    creds['ACCESS_TOKEN'], creds['ACCESS_SECRET'])
while True:                                                               # continuously runs the function
            try:                                                                  # attempt to execute the below function
                status_stream = stream.statuses.filter(track='#machinelearning')  # start the stream to search for tweet
                time.sleep(900)                                                   # waits 15 minutes (900 seconds) before starting over
            except:                                                               # when above function fails, print error
                print("Unexpected error:", sys.exc_info()[0])
                time.sleep(900)
Enter fullscreen mode Exit fullscreen mode

Note before how the streaming function disconnects after sending out a suitable tweet to prevent incoming tweets from damming up. Here the while True statement and time delay restart the stream to post a tweet every 15 minutes to ensure that only the most recent tweet meeting the defined criteria will be used. The try and except statements together ensure that the stream will not be permanently interrupted when an error occurs.

While testing my original code the account was flagged by Twitter as being too spammy. The app was muzzled and lost API writing privileges until it was fixed. Here is one tweet in question:

It turned out the issue was in the unsolicited @mention of the user as prohibited in the automation rules. I remove the '@' from the title line and for good measure also include it in the list of rejected characters within the bigram function. This means that any tweet including an @screen_name after going through the function will be reduced to just their screen_name.

Hosting the bot

The final step now is to allow the Twitter bot to continuously run and not depend on my laptop. After a two-day online training, I was able to understand the basics of Amazon Web Services (AWS). I launched an application with Elastic Beanstalk that established a server instance to run the code and allocated cloud storage for keeping record of code deployments (EC2 and S3 respectively, for those in the know).

I am deeply satisfied with the results, shown below in a non-random sample of before and after tweets. Some poems come out a bit odd, but all in all I am impressed with the elegance of composition. Seriously, read them aloud and hear their poetry!
Liquid error: internal

This Twitter bot that automatically writes Bigram Poems may not be what the world needed, but it is certainly what the world deserves.

Gratitude and resources

I would like to first thank my friend Seth Yost for being a sounding board to this project and for tackling some AWS kinks. I also am appreciative of Emily Cain for her blog post as well as Jared for his post that both outlined aspects of the process and offered sample code.

For those looking to explore more with the Twitter API in Python, I recommend this guide which I admittedly wish I had discovered before the project was already finished. Finally, check out Botwiki for a list of articles, sample bots and guides for how to build your own.

Oldest comments (0)