An open source clone of Instragram/Snapchat filters on the web with Javascript

Lucas de Ávila Martins — Mon, 30 Dec 2019 20:38:46 +0000

When I first saw Instagram's and Snapchat's filter I thought they were all magic.

Later I came to know that it is powered by AI and 3D CGI. But that still doesn't explain much, right?

In order to build a filter you need to do 3 things:

Find the face
Put stuff on the face
Add color to the effect

So lets dig into it!

Find the face

What I mean by find the face: Locate it's position and rotation in three dimensions. If you look around you will probably see this refered as defining the head pose with 6 degrees of fredom.

The approach I used is the one based on this blog post and it goes like this:

Locate certain keypoints (nose tip position, left eye position, etc...) in the image.
Given an approximated 3D representation of the face, solve the Perspective-n-Point and get the face's rotation and translation in 3D.

Locate keypoitns

For this task I'm using an AWESOME library called face-api.js. You give it an image or a video and it will return a list of where are 68 keypoints on a human face.

The way it works is best explained at the project's page but in short:

Find where in the image the face is (the blue square on the right side of the gif), this is done using Tensorflow to run the image through a Neural Network.
Now that you have only the cropped face apply it to another Neural Network, this one will output positions for the keypoints.

Solve Perspective-n-Point

Given where the keypoints are we can use an estimated 3D model of the human face and try to rotate and move it around so that it's projection would be the same as the one observed.

We need a list of the 3D points that correspond to the 2D ones observed in the image, we don't actually need a 3D model at all.

But, of course, having this 3D model makes our life easier because it's now a matter of measuring it and getting these 3D points.

I moved a cube to the desired points and the copied and pasted the location Blender (or any other 3D modelling software) would tell me the object is.

We would also need to know some parameters about the camera (focal length, center of projection, etc) but we can just approximate them and it works great.

Now feed your 3D points and 2D points to something like OpenCV's solvePnP and you're done. It will give you a rotation value and translation values that when applied to the object in 3D would produce the same projection.

The only problem I got using this approach was that currently compiling OpenCV to WASM would produce a binary blob of ~1MB and 300k of JS after spending a whole day trying to decrease this size (it started at around 4MB).

I didn't want to download and parse all of this just to run one function on my client's mobile phone.

That's why Filtrou.me uses another AI to solve the PnP. If you're interested in the details of this AI read the next blog post.

Put stuff on the face

Great! We now know the rotation and translation to apply to whatever we want to draw over the face.

So let's do it! This couldn't be easier.

We use three.js to create a scene, camera and an object.

Then we apply the rotation and translation given in the previous step to this object:

export const onResults = (
  q: THREE.Quaternion,
  x: number,
  y: number,
  z: number,
) => {
  threeObject.rotation.setFromQuaternion(q);
  // if you're reading Filtrou.me's source code you'll see that
  // y coordinate is corrected given the video aspect ratio.
  // thats because the solvePnP AI sees the video as a square
  // and we're displaying it with diferent aspect ratios there.
  // If you use OpenCV's solvePnP or a square video with solvePnP AI
  // then the correction won't be needed.
  threeObject.position.set(x, y, z);
};