DEV Community

mayank-p
mayank-p

Posted on • Edited on

Sesame Street: Harmless Kids TV Show or Skynet's First Step?

When most people think of Sesame Street, they believe it to be an innocent kids TV show that teaches our youth words and math. Count von Count, with his number knowledge. Big Bird with his goofy and awkward height, which is only matched by his big heart. And Cookie Monster, whose only goal in life is to eat cookies--or is it? These seemingly harmless characters have an ulterior motive. A much darker one that would end humanity as we know it. Their easy-going demeanors are meant to catch us off guard until it's too late...BERT, ERNIE, and most importantly, their notorious ring-leader ELMo are more than just Sesame Street characters. They are the forefront of cutting edge AI designed to learn and adapt to the human language. In other words, they learn how to use and communicate through our languages for reasons unknown. It's time to wake up, everyone. Elmo is evil.

No. Not really, Elmo. Not yet at least.

Predicting what to say is extremely hard. Sometimes even humans have a tough time thinking of what to say at times, like when a long-time crush decides to talk to you or when you're about to say a speech in front an audience. Similarly, computers have a hard time predicting what to say if they're given a sentence or test set. As humans, it takes most of our toddler years to be able to form proper sentences and learn when to say the words they've learned at a proper situation (it doesn't make sense to say I like rainbow ponies and unicorns during a company board meeting). Learning a language is supervised learning that is trialed and tested over years of human development and can continue our whole lifetimes. Matter of fact, behavioral scientists say the best time to learn a language is usually our toddler years, which ironically is the target audience of Sesame Street. Coincidence? I think not.

ELMo

ELMo stands for Embeddings from Language Models. Embedding is the process of converting words into vectors in a space. This is advantageous because you can then apply graphical techniques to strings to predict or visualize them. Existing word embeddings at the time like GLoVe used to assign the same vector for the same word, without context clues. However, this has flaws that can be improved upon.

For example:

Sentence 1: The worker has to buy the paint.

Sentence 2: The worker tried to paint the fence.

As English-speaking humans, we know that paint in both sentences have different meanings, so they should be represented differently. Previous word embeddings would give "paint" the same value, even though they represented different ideas. ELMo differed from other existing word embeddings because it gave the same word different vector values based on the context of the sentence. After converting the words, it then puts its embeddings through its magic sauce of RNNs and CNNs in order to spit out its predicted outcome by drawing them as a vector and then converting them back to strings. This is seems innocent enough, so how does this relate to evil Elmo? Well, in his song, Elmo's world, he talks loving his crayons. What can crayons do? Draw! Yes. I'm saying Elmo uses his crayons to draw embeddings and try to speak like a human.

In case you forgot the song: https://www.youtube.com/watch?v=OeVp9S1HzqI

BERT

Similar to ELMo, BERT, which stands for Bidirectional Encoder Representations from Transformers, is another language processor that is used to predict words. BERT also uses RNNs in order to model.

RNNs are neural nets that add a time variable. This method is commonly used in translators that take each word you say and convert it to a different language. However, the problem with this is that it generally "forgets" words said earlier in a paragraph. This could hurt the prediction model because it could forget the context clues provided earlier in a text of data. Furthermore, they read data and do calculations in only one direction, which affects predictions.

Instead of reading words from left to right (like we do in English) or right to left (like we do in Arabic), it uses a Transformer in order to read text both ways, which why it's called bidirectional. Transformers in BERT are used to focus on certain details in a test set. Take a moment to try to identify the important features you notice on the picture below before you go onto the next paragraph.

Example Pic

In this picture above, our brain notices a couple of important features. The first is the bridge that is clearly outlined and in the middle. The second is the mountains in the background that contrast with the sky. The bridge attracts our brain because it is in the middle, and the mountains attract our brain because of the color contrast between them and the sky. The rest of the picture our brain decides is not as important such as the middle of the sky or the water. Over the years, our brain has been conditioned to generally look for objects in the middle of the picture and objects that contrast greatly with its surroundings. Similar to how our brain uses the ENHANCE! feature, this Transformer does the same.

The Transformer then encodes the training data into vectors, like ELMo, and give each of them weights for each node while reading the sample backwards and forwards. BERT also embeds words based on their position, entire sentences, and even each word. This allows the neural nets to train on a lot more data and come up with a better model.

Fun facts:

BERT was trained on Wikipedia articles.

There's a Small BERT, Medium BERT, Large BERT, and Extra Large BERT.

Researching about the complexity of language model has left me in awe at how powerful and smart our brains. The smartest and largest companies in the world are trying to imitate our brain--something we take for granted. Language processing still has a long way to go before it can accurately predict what to say properly. However, it took evolution millions of years to get to this point of the brain. It has only taken a decade to get AI from nothing to closely modeling language processing that the brain does. The rapid growth and development is astonishing, but it'll still take some time. So for now, I guess Skynet still has to wait a while.

If you want to see an example of language processing models in action, here's a great link about someone who fed a model Hallmark movies and made a script through it.

https://twitter.com/KeatonPatti/status/1072877290902745089

Update 1: I see suspicious "people" following me around. I think Skynet is on to me...

Update 2: Human fine. Nothing wrong.

Information from:
https://medium.com/@jonathan_hui/nlp-bert-transformer-7f0ac397f524
https://medium.com/synapse-dev/understanding-bert-transformer-attention-isnt-all-you-need-5839ebd396db
https://www.analyticsvidhya.com/blog/2019/03/learn-to-use-elmo-to-extract-features-from-text/
Picture from:
https://www.deviantart.com/nickchoubg/art/Landscape-Wallpaper-299457255

Top comments (5)

Collapse
 
deciduously profile image
Ben Lovy

I don't understand this post but I'm on board.

Collapse
 
mayankp profile image
mayank-p

Accidentally published this before finishing!

Collapse
 
deciduously profile image
Ben Lovy

you can set published to false in your post's metadata to pull it back to draft status!

Collapse
 
michaelchen19 profile image
Michael Chen

wake up sheeple

Collapse
 
integerman profile image
Matt Eland

How can I help contribute to Elmo's open source skynet project?