RU_PlayingWithTech

Posted on Jan 11, 2021

Building a Netflix Actor Identifier - Playing with AWS Rekognition, Lambda, API Gateway and Chrome Extensions

#aws #rekognition #lambda #chromeextension

Background

I was talking to my friend and we were discussing how cool Amazon Prime X-ray feature is and how it shows the actors who are currently in the scene. There have been so many moments, where you recognise the actor and you just can't place them and searching IMDB still takes a long time, so this X-Ray feature was amazing.

I am currently studying for AWS developer associate certification and came across Amazon Rekognition services which does have celebrity recognition features and thought let's try and integrate this with netflix and that way we can detect celebrities and have a very similar feature Amazon Prime Video X-ray.

High Level Design

We need to try and get an image of the current screen in netflix and send it off to AWS Rekognition services. We can then get back the data given from AWS Rekognition and display it somewhere. Chrome Extensions would be perfect for this as it can take a screenshot and it essentially runs its own web app so can send a request to the service and then display the necessary info on the popup it provides. (Also I haven't played around with Chrome Extension before so this would be perfect).

I am also learning API Gateway and AWS Lambda as part of the certification course and those would also be very useful tools to use and get a nice end to end flow without having to start a server to directly connect the Chrome Extension to AWS Rekognition. That also wouldnt be secure as you will have to put your AWS Config details in the front end for anyone to see.

Building The Solution

Breaking down the steps

The first thing I like to do is break down this very large problem into smaller ones and then put it altogether in the end.

For this process, I figured out the steps would be:

How to make an AWS Rekognition Request
How to use AWS Lambda to make that request
How to send the data across to AWS Lambda from API Gateway
How to use chrome extensions to create a screenshot
How to send that screenshotted data to API Gateway
How to format the returned data

Step 1 - Making an AWS Rekognition Request

To quickly do a proof of concept, I took a screenshot of Friends on netflix, logged into aws console and went to the AWS Rekognition service. Once I uploaded the image, it came up with 100% confidence of David Schwimmer and that gave me fairly good confidence that this would work.

It also provided the JSON Request and Response which helped me further understand what I needed to do. The response also had the IMDB link which was an added bonus.

I then read through the Documentation, there were really only 2 methods that could be used:

Using an S3 Bucket
Sending the image as bytes

Using an S3 Bucket would not be the best idea, as you would have to make a call to upload to S3, store that object URL somewhere and then create the request. You will have to then delete the S3 image as it won't be used anywhere else and there is no point of having it. This will also provide extra charges, so S3 was a no go. The only way was to provide it with bytes.

So now we know what to do, next step is to make the call in Lambda.

Step 2 - Using AWS Lambda to make the request

I Opened up AWS Lambda, and pressed create function. Rather than creating a function from scratch, we can use a blueprint of one which has already been done and then we can modify that as necesary. Lucky for us, there is 1 Lambda Rekognition Blueprint which will be perfect to analyse and understand.

Once we select onto that, we need to ensure our role has access to AWS Celebrity rekognition permission. Selecting Amazon Rekognition read-only permission will be useful for us should cover it. There is no need for s3 trigger, so we will leave that as it is.

Looking at the templated code, there are a few helper functions which aren't celebrity recognition functions, but it does show how to use the rekognition library (which have all been imported) All we have to do a is a simple
response = rekognition.recognize_celebrities(Image={Bytes":decodedImage}) and just return that response.

Now we need to provide the decoded image to that function and everything will work. I first thought we can just send a binary in the API and then lambda can retrieve it, but it is not really the best practice. The next option is to send the image in base64 which is recommended and provide it as decoded.

To get test data, I just googled a image to BASE 64 converter. I found this Site and uploaded the friends screenshot. Just to confirm it has been converted properly, I used a different site to convert it back and see if the image is correct. All is working :)

I created a quick JSON Test Event in Lambda with the format {"imageInBase64":} and just put that in the recognize_celebrities function. However I immediately got an InvalidImageValue, which I thought was a little weird as the image was working when I converted it back.

The issue was that it was in base64 when it should be in bytes. So I imported the base64 library which has the b64decode function. Converted the data, However now I got a different error - Incorrect Padding Issue. Thanks to this Stack Overflow Answer, I added the correct padding by and ran it.

def lambda_handler(event, context): encodedImage = event['imageInBase64'] decodedImage = base64.b64decode(encodedImage + b'===') response = rekognition.recognize_celebrities(Image={"Bytes":decodedImage}) return response

Success!! I have received a correct response where it says David Schwimmer and the IMDB link. Sub-Problem Solved!

Step 3 - Using API Gateway to Call Lambda

This part was easier. I just created a new API function with mainly defaults, created a resource called ImageRecognition and linked it with the Lambda Request. I then deployed it, then tested with Postman and can receive a correct response! Now the AWS configurations is pretty much complete, just need to create the Chrome Extension

Step 4 - Using Chrome Extension to Create A Screenshot

Luckily Google Chrome gives quite a few samples, one of which is taking a screenshot. I looked into that code, it didnt seem too complex. I will have to implement a chrome.tabs.captureVisibleTab(function(data) {...} handler which will be able to take the screenshot and then handle it. It also provided a screenshot in a separate tab. A quick test of it shows that it works and can see that it provides the data in a base64 format, which is exactly what I wanted to have. It included 'data:image/jpeg;base64,' which can be easily removed just leaving the base64 data left. Again I used that code, extracted the base64 and posted in Postman to see if it works and it does!

However, I didnt want it to just be a browserAction, it should be a proper popup. So in the manifest.json, I changed the background js to be a popup js which will run every time the pop up gets opened.

Step 5 - Sending that screenshotted data to API Gateway

Now that I can get the data, I will just need to use the fetch command to post the screenshotted data over. However when doing that, I was getting a CORS (Cross-origin resource sharing) issue as this was coming from a different request. Chrome blocks it, while postman does not. To fix this, I went over to API gateway, went to the method and enabled CORS. Now it works! I can take a screenshot and send it to API Gateway and return that data in JSON.

Step 6 - Formatting the returned data

In my popup html, I created a simple HTML web page (no need for anything fancy as I just wanted a MVP of displaying it). All it should essentially do is show the screenshot and then provide a table of the recognised faces. Since I was not using any proper framework like react on angular, I had to rely back to the basic JS document manipulationg which I havent done in a while. After a bit of tinkering, I managed to ge the returned data and format it in a table. It looked a little ugly, so I just used Google's Materialize css. I am not the best at making it look pretty, I will leave that to my amazing UX Designer friends. I just wanted to get the functionality working.

Moment Of Truth

I went onto one of my other favourite shows - Brooklyn Nine Nine and decided to test it out and it works amazingly

Github Link - https://github.com/rudaiyap/Netflix-Celeb-Recognition

Things that could probably be improved on if I was to continue working on it

Improve Error Handling - There is minimal error handling currently, this will lead to bad user experience
Improve Security - Will need to look into API Gateway more on how to improve security so that only certain applications can access it, rather than anyone who has access can do it and that can lead to excessing AWS Charges
Avoid firing a request every time the popup is open. This will lead to unnecessary requests being sent. I would have it stored in memory and the user can click on a button to manually send the request when they want to

Issues and other notes

The recognition isn't perfect. IT really only works when you get a full front on image of it. I don't think I can do much about that as that heavily relies on AWS Features
This can be used for all sites. It is not an issue, but if I wanted to only do it on certain sites, I can restrict the extension and the API gateway to only allow it from certain sites
This really only works with viewing netflix on browser and not on TV, not too familiar with any TV apis where you can send screenshots, but once that happens, the same API should still be able to be called.

Conclusion

This was really fun to try and create, and I have definitely learnt a lot with AWS Rekognition, Lambda and API Gateway. Gotta play with more of AWS services next time :)

DEV Community