DEV Community

Play Button Pause Button

Smile Detector - using OpenCV to detect smiling faces in a video

hammertoe profile image Matt Hamilton Updated on ・4 min read

Have you every tried to take a snapshot of a video and yet you get an image where everyone is looking glum?

Wouldn't it be great if somehow you could find the "smiliest" frame from a video and extract that for the snapshot?

Well, that is what I try to build in this video!

A 1080p version of the video is available on Cinnamon

This is a video taken from my weekly show "ML for Everyone" that broadcasts live on Twitch every Tuesday at 2pm UK time.

To detect the faces and the smiles we are using the open source computer vision library OpenCV.


The purpose of this code is to be used with Choirless a project I'm working on for Call for Code. We need to extract thumbnail images of the rendered choir in order to display in the UI. At the moment we are taking the first frame of the video, but that is generally not the best frame to use, and later frames with people smiling as they sing are better.


Through the course of the coding session I developed a solution incrementally. First starting with something that just detected faces in the code, then developed that further to detect faces, then optimized it to only process key frames so as to run quicker. The main parts of the code are:

The Face detector

This code uses the detectMultiScale method of the face classifier to first find faces in the frame. Once those are found, within each "region of interest", we look for smiles using the smile detector.

def detect(gray, frame):
    # detect faces within the greyscale version of the frame
    faces = face_cascade.detectMultiScale(gray, 1.1, 3)
    num_smiles = 0

    # For each face we find...
    for (x, y, w, h) in faces:
        if args.verbose: # draw rectangle if in verbose mode
            cv2.rectangle(frame, (x, y), ((x + w), (y + h)), (255, 0, 0), 2)

        # Calculate the "region of interest", ie the are of the frame
        # containing the face
        roi_gray = gray[y:y+h, x:x+w]
        roi_color = frame[y:y+h, x:x+w]

        # Within the grayscale ROI, look for smiles 
        smiles = smile_cascade.detectMultiScale(roi_gray, 1.05, 6)

        # If we find smiles then increment our counter
        if len(smiles):
            num_smiles += 1

        # If verbose, draw a rectangle on the image indicating where the smile was found
        if args.verbose:
            for (sx, sy, sw, sh) in smiles:
                cv2.rectangle(roi_color, (sx, sy), ((sx + sw), (sy + sh)), (0, 0, 255), 2)

    return num_smiles, frame
Enter fullscreen mode Exit fullscreen mode

In the main body of the code we open the video file and loop through each frame of the video and pass it to the detect method above to count the number of smiles in that frame. We keep track of the "best" frame with the most smiles in:

# Keep track of best frame and the high water mark of
# smiles found in each frame
best_image = prev_frame
max_smiles = -1

while 1:
    # Read each frame
    ret, frame =

    # End of file, so break loop
    if not ret:

    # Calculate the difference of frame to previous one
    diff = cv2.absdiff(frame, prev_frame)
    non_zero_count = np.count_nonzero(diff)

    # If not "different enough" then short circuit this loop
    if non_zero_count < thresh:
        prev_frame = frame

    # Convert the frame to grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Call the detector function
    num_smiles, image = detect(gray, frame.copy())

    # Check if we have more smiles in this frame
    # than out "best" frame
    if num_smiles > max_smiles:
        max_smiles = num_smiles
        best_image = image
        # If verbose then show the image to console
        if args.verbose:
            cv2.imshow('Video', best_image)

    prev_frame = frame
Enter fullscreen mode Exit fullscreen mode

There is also an optimization step in which we for a preliminary loop to see how different each frame is from its predecessor. We can then calculate a threshold that will result in us only processing the 5% of the frames that have the most difference.

The full code is available on Github at:


It works pretty well. If you run the code with the --verbose flag then it will display each new "best" frame on the screen and the final output image will have rectangles drawn on so you can see what it detected:

% python --verbose test.mp4 thumbnail-rectangles.jpg
Pre analysis stage
Threshold: 483384.6
Smile detection stage
Number of smiles found: 9

Enter fullscreen mode Exit fullscreen mode

Faces and smiles detected in image

As you can see though, it wasn't perfect and it detected what it thought were smiles in some of the images of instruments.

Next Week I'll be trying a slightly different approach and training a Convolutional Neural Network (CNN) to detect happy faces. Find out more on the event page:

I hope you enjoyed the video, if you want to catch them live, I stream each week at 2pm UK time on the IBM Developer Twitch channel:


Editor guide