DEV Community

Benjamin Delespierre
Benjamin Delespierre

Posted on

Perception Hashing in PHP

TL;DR

Install the lib with composer:

composer install bdelespierre/php-phash
Enter fullscreen mode Exit fullscreen mode

It exposes 2 commands:

vendor/bin/phash generate <image>
vendor/bin/phash compare <image1> <image2>
Enter fullscreen mode Exit fullscreen mode

Perceptual Hashing

Let's say you are developing a social network and you want to prevent people from reposting other people's content. How would you do that?

You can't really use MD5 checksum because chances are the files you're comparing are going to be slightly different and therefore, their hashes won't be the same.

Fortunately, there's a very simple method to determine programmatically whether an image "looks like" another: Perceptual Hashing.

Lucky you, I just wrote a lib to do just that! Here's how it works.

Demonstration

Note: instructions below are borrowed from Hackerfactor, thanks to them for introducing me this algorithm.

Step1: resize to 8x8
Step2: reduce the colors to grayscale.
Step3: calculate the color average.
Step4: iterate over the pixels to compute the bits; if the pixel color is below the average, it's a zero, above it's a one.
Step5: create the 64 bits hash

In PHP

Looks a little bit like this:

$image = $this->manager->make($file)
    ->resize($size, $size)
    ->greyscale();

$sum = 0;
for ($x = 0; $x < $size; $x++) {
    for ($y = 0; $y < $size; $y++) {
        $sum += $image->pickColor($x, $y, 'array')[0];
    }
}

$mean = $sum / ($size ** 2);
$bits = "";
for ($x = 0; $x < $size; $x++) {
    for ($y = 0; $y < $size; $y++) {
        $bits .= $image->pickColor($x, $y, 'array')[0] > $mean ? 1 : 0;
    }
}
Enter fullscreen mode Exit fullscreen mode

You don't have to copy that. Just grab the package.

How to compare hashes

Our hash here is simply a bitfield. I expressed it as a string instead of an acutal bitfield because it's easier to manipulate, especially for beginners.

Now to determine of "far" an image is from another, what we need to do is to compare their hashes. We can do that using the Hamming Distance.

$hash1 = phash('images/1.jpg');
$hash1 = phash('images/2.jpg');
$size  = strlen($hash1);

for ($dist = 0, $i = 0; $i < $size; $i++) {
    if ($hash1[$i] != $hash2[$i]) {
        $dist++;
    }
}

$similarity = 1 - $dist / $size;
Enter fullscreen mode Exit fullscreen mode

And voilà! $similarity will be a float between 0 (entirely different) and 1 (exactly the same). You may consider any value above 0.95 means the images are very close.


Leave a like and comment to tell me what you think.

See you soon for more useful snippets like this one!

Top comments (0)