Ruben Ghafadaryan

Posted on Nov 9

Detecting Logo Similarity: Combining AI Embeddings with Fourier Descriptors

#python #ai #multimodal #fft

Introduction

This article started from a conversation in our V-Mobile office. We were discussing cases where new company logos suspiciously resembled famous brands. In many instances, these similarities seemed intentional—designed to confuse customers and boost sales, especially in smaller markets.
This got me thinking: Could we build a system to automatically detect when a new logo copies an existing one?
At first glance, this looks like a straightforward image similarity problem. Many tools handle this well. However, logos are special. They're not like regular photos or illustrations, and as I discovered, detecting logo similarities is far more challenging than expected.

The Challenge with AI-Based Tools

As AI enthusiasts, we naturally started with popular AI models.

DINO: Great, But Not Perfect

DINO is excellent for image similarity detection. However, it can be easily confused by background changes or gradient fills.
Example: Here are Image 1 and Image 2, a slightly modified version of Image 1. When I tested them with DINO (specifically dinov2-small), it showed a cosine distance of 0.56 between their embeddings.(Note: Throughout this article, "distance" means cosine distance unless specified otherwise.)

This high distance means DINO thinks they're quite different, even though they're clearly similar to human eyes. This creates false negatives—we might miss real similarities.

Image 1.

Image 2.

CLIP: Another Piece of the Puzzle

CLIP is another powerful similarity tool. It builds embeddings based on what the image represents semantically—in other words, it tries to describe the picture's content.
This works great for most images, but logos often contain abstract curves and shapes that don't have clear semantic meaning. When I compared two visually different images: Image 1 and Image 3, CLIP gave them a distance of 0.80, suggesting they're quite similar just because they share some semantic elements.

Image 3.

The Verdict

Relying solely on CLIP or DINO won't give us reliable results. We need additional tools.

Bringing Vectors into the Mix

We needed something to help re-rank results from CLIP and DINO. Ideally, this tool should be:

Invariant to colors
Optionally invariant to rotations or scaling (in case someone tries to trick the system)

I decided to explore vector representations. What if we convert raster images to vectors and analyze the vector data? This could give us more flexibility.

Converting Images to Vectors

First, I converted PNG logos to SVG vector files. But before conversion, I preprocessed each image:

Remove the alpha channel to eliminate transparency
Remove background using rembg
Crop near-white colors to avoid confusing the tracer with minor elements
Limit the maximum dimension to 1024 pixels
Remove noise using a median filter
Increase contrast for clearer edges

After preprocessing, I fed the images to vtracer. To keep things consistent, I limited the output to cubic Bézier curves: parametric curves defined by 4 control points.
The results were promising! The vectorized versions captured the essential shapes while eliminating noise.

Image 4. Original PNG Logo File

Image 5. Pre-processed Image after tracing (a screenshot, as article editor does not allow to load SVG files).

Analyzing Bézier Curves with Fourier Descriptors

Now we have SVG files, but we can't compare text files directly. Instead, we need to compare their geometric components.
vtracer gives us paths as cubic Bézier curves. Here's how we extract meaningful data:

Sample the curves: Since Bézier curves are easy to evaluate at any point, we sample each curve into a fixed number of 2D points;

Apply Fourier Transform: We treat this sequence of points as a signal and apply a Discrete Fourier Transform (DFT)
Extract Fourier descriptors: The low-frequency Fourier coefficients become our shape descriptor
Normalize: We normalize the sampled points to make them comparable:
Subtract the centroid (translation invariance)
Divide by scale (scale invariance)
Optionally fix the starting point (rotation invariance)
Now each curve is represented by a fixed-length vector that we can store and compare, just like other embeddings.

Image 6. An AI-generated image illustrating extraction of Fourier descriptors.

The key advantage: Unlike CLIP and DINO, these descriptors capture pure geometry rather than semantics, making them better for fine-grained shape comparison.

The Catch: False Positives.Unfortunately, this approach has its own problem: false positives. Completely different images might contain similar curves, producing misleadingly high similarity scores.

For example, when comparing two clearly similar images Image 1 and Image 2, the Fourier descriptor distance was 0.63—moderately similar. But when comparing one of them to a completely different image Image 3, the distance was 0.89—only slightly more different.

I also tried calculating Chamfer distance between individual Bézier curves for point-to-point matching, but this made things worse. The problem remained: too many false positives.
At this point, I needed to step back and rethink the approach.

The Solution: A Combined Approach

After all this experimentation, I reached these conclusions:

DINO is powerful but can produce false negatives
CLIP is powerful but can produce false positives
Fourier Descriptors are relatively unstable with false positives, but can still help filter noise

Each method has strengths and weaknesses. The solution? Combine them all.

The Weighted Formula

Similarity = (DINO × 0.7) + (CLIP × 0.2) + (Fourier × 0.1)

I assigned the highest weight to DINO since it's generally most reliable. CLIP gets a moderate weight, and Fourier descriptors get a small weight just to help filter edge cases.
These weights came from empirical testing and produced much more reliable results.

The Optimized Search Strategy

When searching through a database of logos, we don't need to calculate everything for every image. Here's an efficient multi-stage approach:

Stage 1: Use DINO to retrieve initial candidates, then filter with CLIP. Use thresholds to stop search if high similarity found or no similarity found
Stage 2: Use Fourier descriptors to re-rank found similarities
Stage 3 (optional): Re-rank the top results using Chamfer distance with per-path Fourier descriptors

Optionally, before starting the multi-stage approach we can
search for SHA256 hash, to

find full copies of the image
search for perceptual hash, to find copies with minor modifications

This staged approach gives us accurate results while avoiding unnecessary calculations.

The Implementation

I've built a proof of concept system that includes:

A combined storage solution using SQLite3 and FAISS
Storage for DINO embeddings, CLIP embeddings, and Fourier descriptors (both combined and per-path)
SHA256 hash and perceptual hashes for each image
Scripts to populate the database with PNG images
A search script to find similar logos in the database
A direct comparison script for two specific logos
Support for both GPU and CPU processing The code is still under development and does not guarantee stable work. But it still can illustrate the approaches and technics used.

https://github.com/rghafadaryan/logo-similarity

Testing Data

For this work, I used a subset of 500 logo images from the Large Logo Dataset.
Direct download: https://data.vision.ee.ethz.ch/sagea/lld/data/LLD-logo_sample.zip

What's Next?

This project is ongoing. The combined approach shows promising results, but there's always room for improvement. I'm continuing to refine the weights, explore additional geometric features, and test on larger datasets.

I'll be back with more results as this work progresses. If you're working on similar problems or have suggestions, I'd love to hear from you in the comments!

AI Use Disclaimer

AI assistance was used in preparing this article to help with grammar, wording, and clarity, since English is not my native language.

For the coding part of the project, AI-based copilots were used mostly in calculation-heavy sections.
However, every line of code was personally reviewed and verified by me before use.

All technical decisions, conclusions, and interpretations described here represent my own work.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.