DEV Community

Cover image for Introducing ๐Ÿ•ท Spitey-Sense: Simulate conversations with Spider-Man using ๐Ÿช„ GPT-3 and Deepgram โ˜˜๏ธ
Mohit Yadav
Mohit Yadav

Posted on • Edited on

Introducing ๐Ÿ•ท Spitey-Sense: Simulate conversations with Spider-Man using ๐Ÿช„ GPT-3 and Deepgram โ˜˜๏ธ

๐Ÿš€ Introduction

Hey there! ๐Ÿ‘‹ Have you ever wanted to talk with your favorite superheroes, but either they're they're unreachable, or they're too busy shooting for their next blockbuster?

Well, with ๐Ÿ•ท Spitey-Sense, you could interact with a simulated version of Peter-Parker! The idea is simple; OpenAI's GPT-3 was trained (or fine-tuned) using scripts from various Spider-Man movies, and the AI was able to learn from Peter Parker's personality. At the end, it was able to predict exactly what Peter Parker would have said if he was asked something.

Open AI

For example, GPT-3 classifies Peter's personality to be having a sense of soft humor, and a feeling of responsibility. Therefore, the AI would generate responses to your conversations exactly/mostly like how Peter Parker would have.

The goal of this project was to create an interface to seamlessly communicate with the AI engine and show responses to statements in real time, using web-sockets, along with training the model to act in the required way.

๐Ÿช„ Demo

As soon as the user opens up the website, they're greeted with this landing page, from where they could open up the menu, or dive right into the fun part (talking)

Home Page

Upon navigating the /chat route, users are asked to give microphone permission to the app so that they could interact with the AI. A web-socket connection is created with the backend, and the audio is constantly monitored.

Demo

If the user doesn't speak for a few moments, the conversation till now is sent to the OpenAI model, and the AI replies exactly how Peter Parker/Spider-Man would have.

๐Ÿ‹๏ธโ€โ™€๏ธ Training/Fine-Tuning the AI Model

As for the model, I used the Text-DaVinci-001 model, since it is the most powerful General Purpose Artificial Intelligence model out there, and has the highest number of parameters. Also, it's pretty expensive, but is worth it.

Fine-Tuning

For the data, I scoured the internet for scripts, tried fruitlessly to extract information from the PDF script files I got from movie databases, however, it was a nightmare to classify and sort the data according to the requirements. The number of space, tabs, etc. had to be considered, and they often overlapped. Therefore, I continued my search for a better solution.

๐Ÿ’ฟ Getting access to a dataset

After visiting many movie-db sites, I saw an article where someone created a character-simulation using a dataset from Kaggle. I just found what I needed... I quickly downloaded the dataset, and it contained script data for around hundreds of titles. Once I found a super-hero movie, it was time to extract the data from .csv data format.

Kaggle Files Structure

For this purpose, I used ๐Ÿผ pandas, and loaded the data from the multiple files, which were partitioned like a relational-database, with unique and foreign keys for each movie, character, etc. I sourced the conversations file, then extracted the plain text lines along with the people speaking those, and finally, fed them into the model. At the end, I was able to finally get around 80-90 lines which had either Peter Parker, or Spider-Man as an active speaker.

I fed them into the model, and though the waiting time was around a few hours, I was finally able to get the custom-model up and running.

๐Ÿ† Deepgram's Role

Deepgram plays an essential role in the whole lifecycle of the app. From providing the first medium of contact with the app, Deepgram seamlessly transcribes audio in real time with almost unnoticeable latency, along with quotations, punctuations, etc. so that the AI model could analyze the tone of the sentence.

Therefore, Deepgram is an integral and irreplaceable part of the app due to its various awesome features.

โ˜˜๏ธ Submission Category

Analytics Ambassador: Because the app uses Deepgram to analyse expressions in users' voices (inhances inputs with puntuations) and also makes it accessible for people with low-vision/motor disabilites to interact with the AI without typing a word. The recording is sent in real time and the backend analyses it to convert it to a textual transcript.

๐Ÿฅณ Useful Links

Here are some useful links you may want to access:

๐Ÿ– Main Links

๐Ÿฆพ AI Links

๐Ÿก Miscellaneous Links:

Top comments (7)

Collapse
 
mtwn105 profile image
Amit Wani

This is so cool ๐Ÿ˜Ž๐Ÿ˜Ž๐Ÿ˜Ž๐Ÿ’–๐Ÿ’–๐Ÿ’– I m gonna talk to My first superhero to fall in love with :D

Collapse
 
ivanzm123 profile image
Ivan Zaldivar

I find it very interesting, but the demo link is broken ๐Ÿ˜ž

Collapse
 
just_moh_it profile image
Mohit Yadav

Hey, I've been aware. The model I trained was always showing "Processing in progress" or something like that. I think it was an issue with OpenAI. On their community page, they told to wait 2-3 days, but it still didn't initialize. I think I got something wrong with the training that bugged their systems. Whenever a new API call was received, the server would crash and restarting didn't help. Also, after waiting for 3 days, it was still showing the same error.

Anyway, I've replaced the trained model with a prompted design since it was taking too long to post-process on OpenAI's end, and though the prompted design is not perfect, it works much better performance wise. I also added some UI animations. It's now bug-less, and working perfect. ๐Ÿ˜Š

Collapse
 
bekahhw profile image
BekahHW

Very, Very cool!

Collapse
 
just_moh_it profile image
Mohit Yadav

Thanks! I just noticed the demo was down... So if you weren't able to view the app, please check it out now ๐Ÿ˜ƒ

Collapse
 
tomhermans profile image
tom hermans

Interested but demo link results in a broken link (502)

Collapse
 
just_moh_it profile image
Mohit Yadav

It's now working. I've explained why it was broken above ๐Ÿ˜ƒ