Mohit Yadav

Posted on Apr 11, 2022 • Edited on Jan 28, 2023

Introducing 🕷 Spitey-Sense: Simulate conversations with Spider-Man using 🪄 GPT-3 and Deepgram ☘️

#hackwithdg #deepgram #python #webdev

🚀 Introduction

Hey there! 👋 Have you ever wanted to talk with your favorite superheroes, but either they're they're unreachable, or they're too busy shooting for their next blockbuster?

Well, with 🕷 Spitey-Sense, you could interact with a simulated version of Peter-Parker! The idea is simple; OpenAI's GPT-3 was trained (or fine-tuned) using scripts from various Spider-Man movies, and the AI was able to learn from Peter Parker's personality. At the end, it was able to predict exactly what Peter Parker would have said if he was asked something.

For example, GPT-3 classifies Peter's personality to be having a sense of soft humor, and a feeling of responsibility. Therefore, the AI would generate responses to your conversations exactly/mostly like how Peter Parker would have.

The goal of this project was to create an interface to seamlessly communicate with the AI engine and show responses to statements in real time, using web-sockets, along with training the model to act in the required way.

🪄 Demo

As soon as the user opens up the website, they're greeted with this landing page, from where they could open up the menu, or dive right into the fun part (talking)

Upon navigating the /chat route, users are asked to give microphone permission to the app so that they could interact with the AI. A web-socket connection is created with the backend, and the audio is constantly monitored.

If the user doesn't speak for a few moments, the conversation till now is sent to the OpenAI model, and the AI replies exactly how Peter Parker/Spider-Man would have.

🏋️‍♀️ Training/Fine-Tuning the AI Model

As for the model, I used the Text-DaVinci-001 model, since it is the most powerful General Purpose Artificial Intelligence model out there, and has the highest number of parameters. Also, it's pretty expensive, but is worth it.

For the data, I scoured the internet for scripts, tried fruitlessly to extract information from the PDF script files I got from movie databases, however, it was a nightmare to classify and sort the data according to the requirements. The number of space, tabs, etc. had to be considered, and they often overlapped. Therefore, I continued my search for a better solution.

💿 Getting access to a dataset

After visiting many movie-db sites, I saw an article where someone created a character-simulation using a dataset from Kaggle. I just found what I needed... I quickly downloaded the dataset, and it contained script data for around hundreds of titles. Once I found a super-hero movie, it was time to extract the data from .csv data format.

For this purpose, I used 🐼 pandas, and loaded the data from the multiple files, which were partitioned like a relational-database, with unique and foreign keys for each movie, character, etc. I sourced the conversations file, then extracted the plain text lines along with the people speaking those, and finally, fed them into the model. At the end, I was able to finally get around 80-90 lines which had either Peter Parker, or Spider-Man as an active speaker.

I fed them into the model, and though the waiting time was around a few hours, I was finally able to get the custom-model up and running.

🏆 Deepgram's Role

Deepgram plays an essential role in the whole lifecycle of the app. From providing the first medium of contact with the app, Deepgram seamlessly transcribes audio in real time with almost unnoticeable latency, along with quotations, punctuations, etc. so that the AI model could analyze the tone of the sentence.

Therefore, Deepgram is an integral and irreplaceable part of the app due to its various awesome features.

☘️ Submission Category

Analytics Ambassador: Because the app uses Deepgram to analyse expressions in users' voices (inhances inputs with puntuations) and also makes it accessible for people with low-vision/motor disabilites to interact with the AI without typing a word. The recording is sent in real time and the backend analyses it to convert it to a textual transcript.

🥳 Useful Links

Here are some useful links you may want to access:

🏖 Main Links

Project Demo Link: Web Demo
NextJS Web Interface + API Source Code: On GitHub

🦾 AI Links

Source Code for training data: On GitHub
Fine-Tuning GPT-3 Instructions for DIY: OpenAI Guides

🏡 Miscellaneous Links:

Source Dataset: From Cornell University
Design Inspiration: From Dribbble

Oldest comments (7)

Amit Wani • Apr 12 '22

This is so cool 😎😎😎💖💖💖 I m gonna talk to My first superhero to fall in love with :D

tom hermans • Apr 12 '22

Interested but demo link results in a broken link (502)

Mohit Yadav • Apr 16 '22

It's now working. I've explained why it was broken above 😃

Ivan Zaldivar • Apr 12 '22

I find it very interesting, but the demo link is broken 😞

Mohit Yadav • Apr 16 '22

Hey, I've been aware. The model I trained was always showing "Processing in progress" or something like that. I think it was an issue with OpenAI. On their community page, they told to wait 2-3 days, but it still didn't initialize. I think I got something wrong with the training that bugged their systems. Whenever a new API call was received, the server would crash and restarting didn't help. Also, after waiting for 3 days, it was still showing the same error.

Anyway, I've replaced the trained model with a prompted design since it was taking too long to post-process on OpenAI's end, and though the prompted design is not perfect, it works much better performance wise. I also added some UI animations. It's now bug-less, and working perfect. 😊

BekahHW • Apr 22 '22

Very, Very cool!

Mohit Yadav • Apr 23 '22

Thanks! I just noticed the demo was down... So if you weren't able to view the app, please check it out now 😃