Text Poetry Completer

#javascript #markovchain #nlp #beginners

It's project time at flatiron. My group and I were brainstorming different coding projects to complete by the end of the week and we stumbled across a javascript library written by Daniel Howe called RiTa.js. There were a ton of features for us to play with here like: decomposing sentences into parts of speech (nouns, verbs, etc.), syllables for words, and an entire lexicon for us to generate any random word in their 20k lexicon, even specify how many syllables it has and the type of word it is.

After following along with a code demonstration by Daniel Shiffman over at the coding train, we came up with a project involving modifying famous poems or having users input poems to be modified by software. The modified poems would have all the nouns replaced with random nouns from the RiTa Lexicon and having the poems read aloud using another js library for text to speech called Responsive Voice. This had all the appeal of something that can accomplished over a week, as well as being flexible enough to be creative and take the text generation and text to speech as far as we'd like to go.

I tried to do research into different ways in which software generates text and came onto just random text generation, and things called Markov Chains. Essentially, a Markov Chain uses probability to predict words that follow other words and generates the text from this model. Although we ended up going with just a random text generator, I thought it was interesting how a Markov chain was implemented for this sort of problem.

Around the internet I've always seen computer generated text that's been trained on things like joe rogan podcasts or trump speeches. Markov Chains represent a very basic model for text generation but the actual implementation can be fairly straight forward. Alex Kramer at Medium wrote about his

function markovChainGenerator(text) {
  const textArr = text.split(' ')
  const markovChain = {};
for (let i = 0; i < textArr.length; i++) {
    let word = textArr[i].toLowerCase().replace(/[\W_]/, "")
    if (!markovChain[word]) {
      markovChain[word] = []
      }
    if (textArr[i + 1]) {
      markovChain[word].push(textArr[i + 1].toLowerCase().replace(/[\W_]/, ""));
}
}
return markovChain
}

All he uses is a hash to store all words found in a text and set each word key to have all their following words as values. Thus, we can implement a word guess that randomly selects a value for that word key to generate text. I thought this was really elegant and wasn't something unmanageable like using a neural network or some other machine learning algorithm. It's very approachable.

However, RiTa comes with a nice Markov Chain for text and sentence generation right out the gate. And instead of just using single words as keys, users can specify how long the individual units can be (1,2,..or n words long; otherwise known as a chain's n-gram). This is really neat and allows for some fun results. I took the opportunity to generate some faux Bob Ross quotes by training the Markov Chain on existing quotes. The experiment took all of three lines (barring the length of text thats assigned to out variable "text".

rm = new RiMarkov(2)
rm.loadText(text)

then:

rm.generateSentence()

And out popped these gems.

"Like I always throw the fish back into the water, just put a few indications of sticks and twigs and other little things in there. . . . .."
-Bob Ross

"Boy, I gotta put in a big tree."
-Bob Ross

"You can be an alligator or Georgia somewhere down there, is believing that you did this canvas."
-Bob Ross

"Artists are a gentle whisper, we just go there."
-Bob Ross