A Story of Face Recognition using Python

#python #deeplearning #facerecognition

Introduction to the Problem

I recently found myself in a position where I was supposed to find pictures of myself from a set of ~1700 photos. I decided that coding my way out of it was the only possible solution as I do not have the temperament to sit and look at that many pictures. This would also allow me to play with python and do some deep learning stuff which I don't necessarily get to do as a web developer. Over the weekend, I wrote a small program that leverages face detection & recognition to find images of me from a large dataset and paste them into another directory.

Prerequisites

To recreate this experiment, you will need to be aware of the following libraries:

face_recognition: An awesome and very simple to use library that abstracts away all the complexity involved in face detection and face recognition.
image_to_numpy: Yet another awesome library, written by the same author as above. It is used to load image files into NumPy arrays. (more details later on).
opencv-python: Python bindings for the OpenCV library

Setup

Project Directory

./
│   main.py
│   logger.py    
│
└───face_found
│   │   image.JPG
│   │   _image.JPG
│   │   ...
│
└───images
│   │   image.JPG
│   │   ...
│
└───source_dataset
│   │   image.JPG
│   │   ...
│

main.py contains the core of this project.
logger.py is a simple logger service that prints out colored logs based on message severity. You can check it out here!
images directory contains all the images in .JPG format.
source_dataset directory contains sample images of the person whose face needs to be searched, in .JPG format.
face_found directory is an image dropbox for the search results.

Code Flow

The program first iterates through the source_dataset directory. For each image in the directory, the face encodings are extracted and stored inside an array. Let's call these "Known Faces". The code then proceeds to iterate through the images directory. For each image, the face locations of each face are extracted. These locations are then used to extract the face encodings of each face. Let's call these "Unknown Faces". Each unknown face will now be compared against all the known faces to determine whether there is any similarity. If a similarity exists, the image will be stored inside the face_found directory.

Code Explanation

"""INIT"""
SOURCE_DATA_DIR = "source_dataset/"
IMAGE_SET_DIR = "images/"
FACE_FOUND_DIR = "face_found/"
TOLERANCE = 0.5
FRAME_THICKNESS = 4
COLOR = [0, 0, 255]
DETECTION_MODEL = "hog"

log = Logger(os.path.basename(__file__))

source_faces = []  # known faces

start = time.perf_counter()

This part of the code sets up the basic conditions for the script. Things to notice here are TOLERANCE, DETECTION_MODEL and time.perf_counter().

TOLERANCE is a measure of how strict the comparison should be (distance between a known face and an unknown face). A lower value is more strict whereas a higher value is more tolerant. The documentation for face_recognition mentions that a value of 0.6 is typical for best performance. I initially tried 0.7 and then 0.6. These resulted in some inconsistent matches. As a result, I settled for 0.5.
DETECTION_MODEL are methods (read pre-trained models) through which we perform face detection. For this project, I initially considered the CNN (Convolutional Neural Network) model, a neural network used primarily in computer vision. To implement this, we need to ensure that we have a CUDA enabled GPU and that dlib (face_recognition's underlying core for machine learning and data analysis) can recognize this hardware.
```
import dlib
print(dlib.cuda.get_num_devices())
```
If the above code snippet gives us a value of >=1, then we can proceed to use CNN. We can check if dlib is using our GPU using print(dlib.DLIB_USE_CUDA). This should ideally return True but if it doesn't, then we can simply set dlib to use CUDA: dlib.DLIB_USE_CUDA = True and everything should work fine. However, in my case, I was getting MemoryError: std::bad_alloc when attempting to find face locations using CNN. From what I understand, this was because of the large resolution of the images (largest: 6016 × 4016 pixels) that I was loading. My available solutions were to either reduce the resolution of all my images or move away from CNN (more memory intensive). For the time being, I decided to use HOG (Histogram of Oriented Gradients) instead. HOG utilizes classification algorithms such as the SVM to determine the existence of faces. A comparison between HOG and CNN suggests that HOG is faster in terms of computation time but less reliable when it comes to accuracy. CNN tends to be the most accurate. You can read Maël Fabien's work on face detection where he covers both these models in great depth.
start = time.perf_counter() starts a counter. This value is utilized at the end and serves as a basic time metric for code performance.

FRAME_THICKNESS and COLOR are optional and can be ignored.

"""LOADING SOURCE IMAGES"""
log.info(f"loading source images")
for index, filename in enumerate(os.listdir(SOURCE_DATA_DIR)):
    if filename.endswith("JPG"):
        log.info(f"processing source image {filename}")
        img_path = SOURCE_DATA_DIR + filename
        # using image_to_numpy to load img file -> fixes image orientation -> face encoding is found
        img = image_to_numpy.load_image_file(img_path)
        try:
            source_img_encoding = face_recognition.face_encodings(img)[0]
            log.success("face encoding found")
            source_faces.append(source_img_encoding)
        except IndexError:
            log.error("no face detected")

if (len(os.listdir(SOURCE_DATA_DIR)) - 1) - len(source_faces) != 0:  # -1 for .gitignore
    log.warn(f"{str(len(source_faces))} faces found in {str(len(os.listdir(SOURCE_DATA_DIR)))} images")

We then proceed to iterate through each image in the source_dataset directory. Most of it is standard python stuff. The if condition if filename.endswith("JPG"):, albeit a bit crude, only exists to ignore the .gitignore file. Since all my pictures are .JPG, I have a 100% guarantee it won't fail.
Next we construct the image path and pass it as an argument to image_to_numpy.load_image_file(). To me, this is the most interesting bit in this snippet because if you've used face_recognition, you know it already has it's own load_image_file() function that also returns a NumPy array of the image. The reason I had to go and use another library just for loading the image was because of the image orientation. During the earliest version of this code, I was baffled at how some faces failed to be detected. After some research, I learned that face detection completely fails if the image is turned sideways. image_to_numpy's load image function reads the EXIF Orientation tag and then rotates the image file if required.
Lastly, we pass the loaded image file to the face_recognition.face_encodings() function. This returns a list of face encodings for each face in the image but since I know thatsource_dataset contains only single shots of me, I can simply access the first element in the array. Hence the [0] at the end of this line. This bit is encapsulated in a try-except block so that if face detection fails, the raised exception doesn't crash (as it did when pictures were being loaded sideways). Otherwise, these encodings are added inside the source_faces (known faces) object we instantiated in the INIT part of the code.
In the end, we check if the length of the encoded faces array is equal to the total items in our directory (minus the .gitignore) and print a log if it's not. This helped me figure out the sideway quirk.

"""MAIN PROCESS"""
log.info(f"Processing dataset")
for index, filename in enumerate(os.listdir(IMAGE_SET_DIR)):
    if filename.endswith("JPG"):
        log.info(f"processing dataset image {filename} ({index + 1}/{len(os.listdir(IMAGE_SET_DIR))})")
        img_path = IMAGE_SET_DIR + filename
        img = image_to_numpy.load_image_file(img_path)
        try:
            locations = face_recognition.face_locations(img, model=DETECTION_MODEL)
            encodings = face_recognition.face_encodings(img, locations)
            for face_encoding, face_location in zip(encodings, locations):
                results = face_recognition.compare_faces(source_faces, face_encoding, TOLERANCE)
                if True in results:
                    log.success("match found!")
                    # optional start
                    top_left = (face_location[3], face_location[0])
                    bottom_right = (face_location[1], face_location[2])
                    cv2.rectangle(img, top_left, bottom_right, COLOR, FRAME_THICKNESS)
                    cv2.imwrite(FACE_FOUND_DIR + "_" + filename, img)
                    # optional end
                    copy(img_path, FACE_FOUND_DIR)
                    break
                else:
                    log.warn("no match found")
        except IndexError:
            log.error("no face detected")
        except Exception as err:
            log.error(f"error encountered: {err}")

stop = time.perf_counter()

log.info(f"Total time elapsed {stop - start:0.4f} seconds")

We now attempt to read each image in the images directory to compare it with our list of known faces. The code starts the same way as our previous code block. Things slightly change when we enter the try-except block. Before, we knew that all images were from the source_dataset directory, which contained only single shots of m know we need to adjust the tolerance any time a face that isn't ours is in the face_found directory he. The current directory also contains group shots. Therefore, we will first find the locations of all faces in the image: locations = face_recognition.face_locations(img, model=DETECTION_MODEL) and then use those locations to get the face encodings: encodings = face_recognition.face_encodings(img, locations). This will return a list of all face encodings found within an image. We then iteratively test each of these encodings against all our well-known/source/my faces: results = face_recognition.compare_faces(source_faces, face_encoding, TOLERANCE). If a match exists, we copy the picture to the face_found directory: copy(img_path, FACE_FOUND_DIR). Else, we move on to the next face encoding and then the next image.
Initially, when the tolerance was set to 0.7 and then 0.6, I had some pictures inside the face_found directory where my face did not exist. The optional bit comes in handy here. Whichever face the code considers a match, a bounding box of COLOR and FRAME_THICKENSS is drawn over it using the face locations we extracted earlier: cv2.rectangle(img, top_left, bottom_right, COLOR, FRAME_THICKNESS). This version of the image is saved with an underscore preceding its name. This way, we know we need to adjust the tolerance any time a face that isn't ours has a bounding box on it).
The very last bit stops the counter and prints the total time elapsed for this entire process.

Conclusion and Future Work

At the end of this exercise, I had 804 images in the face_found directory. If half of the saved pictures have bounding boxes on them, then 402 images were a match. Not knowing how many of the pictures are mine beforehand makes it a little hard to judge the accuracy of this code. The code should ideally be modified and tested against a more contained dataset so that actual results can be evaluated. Moreover, It took ~3.7 hours (13312.6160 seconds) for this code to process ~1700 pictures, and that too with a faster face detection model. Implementing this code on a distributed processing model would be a lot more time-efficient. Finally, the program would benefit from an added block of code that creates low-resolution variants of the images and then uses the CNN model over HOG for more accurate results.

For now, however, I was able to extract ~400 images out of ~1700 without much effort. For me, that is a success story. This exercise also allowed me to revisit python and play around with face recognition, which in itself was a lot of fun. You can check out the complete code repo here.

DEV Community