Posted on Jan 12, 2023 • Edited on Jan 15, 2023

Image Feature Detection and Matching using OpenCvSharp - First Steps Towards VinylEye

#javascript #webdev #frontend #discuss

In a previous post, I have gone through the process for building a Docker Image for OpenCVSharp that supports multiple processor architectures.

This post will include a couple of building blocks for implementing a simple content based image retrieval system where the query image is a photo of a vinyl album front cover against of a database of cover art images scraped using MusicBrainz API and Cover art Archive.

The posts expected as part of this series are as following:

Image Feature Detection and Matching using OpenCvSharp - First Steps Towards VinylEye (This post)
Feature indexing and search for content based image retrieval using smaller and larger database sizes.
Adding a front end and Back end to VinylEye.
Instrumentation and packaging for deployment.
Expanding VinylEye with Deep Neural Networks.

The concepts used in the current post are:

Scale Invariant Feature Transform (SIFT) Algorithm for feature extraction.
Nearest neighbour search using OpenCV FlannBasedMatcher to build our database and match against query images.
Using OpenCv to compute Homography matrix and using this information to apply perspective correction.
A bare bones .Net 7.0 command line application to run tests against a set of query images and capture the results.

As illustrated in the image below, given a query image located in the middle left column (red outline), our goal is to find the cover art image that best matches the query image provided by the user. Besides the fact that our query image contains a cover photo that is a little tired🥱 and upside down, we can see that there are two potential candidates from the database located at top right and bottom right (blue outlines) as good matches. While the functionality is part of the next post, this post will focus on basic building blocks necessary for implementing such a system.

If the album art (training image) exists in the database, they are unlikely to be an exact match for the following reasons:

The cover art images are built using different versions of the album (CD, LP, Japanese Release, ...)
They are photographed by users or scanned.
The query image will be:
- At a different scale
- Sometimes occluded by other objects
- Might be oriented different to how database images are oriented.

There are several algorithms that can be used to extract and match features from images and all have specific strengths as well as trade off with regards to the typical challenges around image processing such as illumination, scale and orientation differences as well as IP restrictions. In this post, we will focus on Scale Invariant Feature Transform (SIFT) Algorithm.

Scale Invariant Feature Transform (SIFT) Algorithm

SIFT algorithm is used for extracting key points that contain distinctive information and describe local features in images. The key points and features extracted by SIFT algorithm are robust to changes in an image such as illumination or viewpoint changes as they are invariant to image scale, rotation, and affine distortion. The algorithm has been developed by David G. Lowe and the patent has expired on March 7, 2020. Since then, the module has moved from the non-free modules into the main OpenCV repository as part of features2d repository.

SIFT algorithm starts by finding the locations of local extrema in scale-space, which are then filtered and fit to a mathematical model to determine their stability. Key points that pass the stability test are then described using local image gradients, which are transformed into a representation that is invariant to orientation.

The resulting key points and features are then used for tasks such as object recognition, image matching, 3D reconstruction and image stitching. Earlier uses of SIFT algorithm also included Simultaneous Localisation and Mapping (SLAM) applications that used a singe camera and tracked the motion of a robot or vehicle between each frame captured.

Matching features

With SIFT, we have a choice of matching algorithms to chose from and some of these are available from OpenCV as well. As the code in this article uses OpenCV implementation of Fast Library for Approximate Nearest Neighbors (FLANN) based matcher, this section will provide a brief background.

Using FLANN in conjunction with SIFT, we can speed up the feature matching process significantly. FLANN starts with building a tree-based index of the descriptors, which is the key behind achieving fast approximate nearest-neighbor searching functionality.

One of the core concepts is the recursive sub division of the descriptor space into multiple regions or cells. This recursive sub division continues until a stopping criteria is met. As a result, the algorithm finally builds a a hierarchical structure of nested regions, where each region contains a set of descriptors.

To perform feature matching using FLANN, first key points and descriptors are computed from input images and the algorithm is trained on these features. Then when we apply the same feature extraction process to our query image and use these for search, FLANN finds the k nearest neighbors among the descriptors in the trained features by traversing the tree-based index. The matches with the closest matching descriptors are considered to be the correct matches and returned with references to the training image index in the data set.

FLANN algorithm offers a good trade-off between speed and accuracy and therefore a good choice for applications where speed is important.

Feature Matching Demo

In this section the three images below will be used to demonstrate the following:

Matching the features extracted from query and a specific training image to demonstrate how does it look when matches are not filtered as per Lowe's paper.
- Positive matches when filtered.
- Negative matches when filtered.
Another demonstration will apply perspective correction to the query image on the left and then crop and save it using correct aspect ratio based on the training image placed in the centre image below.

Feature Matching without Filtering Good Matches

public static List<DMatch> MatchFeatures(Mat queryDescriptors, Mat trainDescriptors, double kRatioThreshold = 0.75, int minimumNumberOfMatches = 4)
{
    using var matcher = new FlannBasedMatcher();
    var matches = matcher.KnnMatch(queryDescriptors, trainDescriptors, 2);
    return matches.Length >= minimumNumberOfMatches ? matches.Select(x => x[0]).ToList() : new List<DMatch>();
}

If we select the feature matches between two images using the code above without any filtering, there will be many false matches. We can see this visualised clearly below.

Feature Matching with Filtering Good Matches

We can benefit from the SIFT feature ratio test when comparing the similarity of two features, as described in David Lowe's paper "Distinctive Image Features from Scale-Invariant Keypoints" introduced earlier. The ratio test involves comparing the ratio of the distances between a feature and its closest and second closest matches in a group of features extracted from our training image index. If the distances between these features and the feature from query image are different enough (based on the ratio) then the nearest match is kept as a good match.


public static List<DMatch> MatchFeatures(Mat queryDescriptors, Mat trainDescriptors, double kRatioThreshold = 0.75, int minimumNumberOfMatches = 4)
{
    using var matcher = new FlannBasedMatcher();
    var matches = matcher.KnnMatch(queryDescriptors, trainDescriptors, 2);
    var goodMatches =
        (from match in matches
            where match[0].Distance < kRatioThreshold * match[1].Distance
            select match[0])
        .ToList();        
    return goodMatches.Count >= minimumNumberOfMatches ? goodMatches : new List<DMatch>();
}

Applying the filter as above then will yield better results as shown in the image below.

Feature Matching negative case

If we compare two different images and apply filtering, we will not get many matches. The above pair demonstrates no matches but it is also possible to have a handful of false positives when comparing a database of images.

Perspective Correction

Another use case for identifying matches between two images is to apply un-warping or perspective correction. If we are able to identify these two matches, and if our subject is a planar surface, then we can omit the third dimension and and compute a Homography matrix which defines the transformation between these matching coordinate pairs.

Once the Homography matrix is computed, we can then apply this to the input image and optionally crop the output image to restore the perspective based on the training image without manual effort.

The results can be pretty impressive depending on the resolution of the source image as below:

While the code below relies on OpenCV to compute the Homography and then apply the perspective correction and cropping, as long as we have a list of coordinates extracted from the matches, we can also build the Homograpy matrix by hand using the formula and use linear algebra to solve it as a system of linear equations by taking the inverse of the matrix. However, as natural inverse requires a square matrix and our will be unlikely to be square, additional steps will be required to compute the pseudo inverse using Singular Value Decomposition (SVD)


private static int PerformPerspectiveCorrection(string outputDirectory)
{
    var queryImagePath = Path.Combine(outputDirectory, "made_in_europe_query.jpeg");
    var trainImagePath = Path.Combine(outputDirectory, "made_in_europe_train.jpg");

    using var queryImage = ImageHelper.LoadImage(queryImagePath, ImreadModes.Grayscale);
    using var trainImage = ImageHelper.LoadImage(trainImagePath, ImreadModes.Grayscale);

    var (queryDescriptors, queryKeyPoints) = ImageHelper.DetectAndComputeDescriptors(queryImage);
    var (trainDescriptors, trainKeyPoints) = ImageHelper.DetectAndComputeDescriptors(trainImage);

    var matches = ImageHelper.MatchFeatures(queryDescriptors, trainDescriptors, kRatioThreshold:0.5);
    var homographyMatrix = ImageHelper.CalculateHomographyMatrix(queryKeyPoints, trainKeyPoints, matches);

    double width = trainImage.Width + queryImage.Width;
    double height = trainImage.Height + queryImage.Height;

    using var queryImageColour = ImageHelper.LoadImage(queryImagePath, ImreadModes.Color);
    using var unWarpedImage=new Mat();

    if (homographyMatrix == null) return 0;

    Cv2.WarpPerspective(queryImageColour, unWarpedImage,homographyMatrix, new Size(width, height));

    using var croppedAndUnWarpedImage = new Mat(unWarpedImage, new Rect(0, 0, trainImage.Width, trainImage.Height));

    Cv2.ImWrite(Path.Combine(outputDirectory, "query_perspective_corrected.jpeg"), croppedAndUnWarpedImage);
    return 1;
}

PerspectiveCorrectImageCommand.cs

Once we have the matrix, we can use it for perspective correction on an enlarged image canvas and then crop using the target image dimensions to achieve the image at bottom right.

In this post I have provided minimal code to achieve feature extraction and matching functionality as a starting point. the next post will build on this by covering the topics from indexing features from multiple image and being able to query the index and get information about matching images. Hope you have enjoyed the post and please feel free to share your perspectives and use cases for these techniques.

Using Docker to Verify the Commands

The image built as part of this post will support both amd64 and arm64 architecture and as long as the volume mounts are adjusted accordingly, should function. Below is an example using terminal on a Mac Book Pro 2017 and running both match as well as perspective correct commands and copying the image files to the directory we mounted from our machine.

For more context, please refer to VinylEye repository.

# a blank directory to pass to the container
~ $mkdir images
~ $ls images
# mount the directory and execute the container with default arguments:
~ $docker run -v $(pwd)/images:/output syamaner/vinyleye:1
~ $ls images
made_in_europe_query.jpeg       query_perspective_corrected.jpeg
made_in_europe_train.jpg        the_real_thing.jpg
# Override the CMD arguments passed to use match command instead:
~ $docker run -v $(pwd)/images:/output syamaner/vinyleye:1  match --output-directory /output
~ $ls images
made_in_europe_query.jpeg       matching_pair.jpg           query_perspective_corrected.jpeg
made_in_europe_train.jpg        non_matching_pair.jpg           the_real_thing.jpg
~ $