DEV Community

Cover image for Creating Image Hash Collisions
Jeremy Mill
Jeremy Mill

Posted on

Creating Image Hash Collisions

Image Hashing

The problem

Say you have an API that allows customers to upload photos and you want to ban some set of nuisance photos. You might immediately think to use one of the hash functions you already know to check new uploads against a deny list. Hash functions like MD5, SHA1, or some other cryptographic hash function may come to mind. And this would work assuming that the photo never changes by even a single bit.

Unfortunately for us and our hypothetical API it is very easy to create two images that are perceived by a human to be the exact same, but whose cryptographic hashes are totally different. Take for instance the following two photos:

Two Wedding Photos

The left hand photo has a SHA1 sum of

bb3ff9bdf224e8a058d3c4c3c5fb0dd45034519b  wedding.jpeg
Enter fullscreen mode Exit fullscreen mode

and the right hand photo has a SHA1 sum of

ed14e6d3eb42ad66d5a5f49e554b9e0f765da0cd  wedding-compressed.jpeg
Enter fullscreen mode Exit fullscreen mode

A human would perceive these two photos to be the same image. The only difference is that the right hand photo has been compressed by 1%. This effect can also be achieved by cropping 1 row or column of pixels or changing one pixel in the image from grey to 'slightly-darker-or-lighter-grey', as well.

Other hashing methods

So, clearly cryptographic hashes aren't going to work for us here. There are just too many ways to modify an image without losing the ability for a human to perceive them as the same. One solution to this problem is a class of hashes known as perceptual hashes. Some perceptual hashes are aHash, pHash, and dHash. The one we'll discuss in detail here and provide a collision/collision generator for is known as difference hashing (dHash).

As far as I can tell, the definitive source on these hashes is a blog written by Dr. Neal Krawetz: http://www.hackerfactor.com/blog/ and For dHash specifically, you should read this post. If you read that post you should feel free to skip the next section as Dr. Krawetz goes more in depth than I will.

How dHash works

At a very high level the dHash algorithm is:

  1. Create an 8x8 array of booleans that is the hash output
  2. Convert an image to greyscale
  3. Resize the image to a 9x8 pixels
  4. For each pixel you compare the left hand pixel to the right
    • If the right is brighter than the left, store True in the resulting hash array at the corresponding index
    • otherwise store False
  5. Flatten the hash array
  6. Convert the bools to bits and turn it into a string
  7. Return that string as a hash

So for example, our wedding photo from above when converted and compressed turns into:

wedding converted and compressed

And results in the following hash:

[[ True False False False False  True False False]
 [ True False  True False False  True False False]
 [ True  True False  True False  True  True False]
 [ True False False  True  True False  True False]
 [ True False False  True  True False False  True]
 [ True False False  True  True False  True  True]
 [ True  True False False False False  True  True]
 [ True  True  True False False  True False False]]
Enter fullscreen mode Exit fullscreen mode

Where the first item in the hash matrix (0,0) is True because the pixel in the image at (0,0) is slightly darker than the pixel in the image at (0,1).

When this hash array is folded and turned into a string, it results in the dHash value 84a4d69a999bc3e4. According to Dr. Krawetz, who has a larger data set than I do, this method has a relatively low false-positive and false-negative rate and is significantly faster than the other perceptual hashes tested.

Creating Collisions

So, dHash is pretty cool! But what if I want to intentionally create a collision? What if I have two images and I want them to have the same dHash value while being perceived by a human to be totally different images? That was the question I asked myself and was surprised to find that no one had answered (as far as I could tell). To be fair, even if someone had answered it, I probably would have tried to do it myself anyway.

I will call the image whose hash we want to match the collision target and the image we're going to modify the mod image. My algorithm for creating a collision is:

  1. get the hash for collision target
  2. Divide the image into 9x8 sub-images
  3. Iterate over the hash of collision target:
    • calculate the current hash of mod_image
    • if True: check that the sub-image to the right of the current is brighter than the current. If it's not, brighten it!
    • if False: check that the sub-image to the right of the current is darker than the current. If it's not, darken it
    • rebuild mod-image from the sub_images
  4. Check the hashes for equality re-iterate as necessary (repeat step 3) to make sure that the sampling used during resizing doesn't break the collision

Example

Given the following two images where the left hand image is the collision target and the right hand is the mod image:

target and mod

We can run the above algorithm and produce the following image which is a dHash collision with the collision target:

the collision image

Both of these images have a dHash value of 0193210ed49192c8

Proof of concept code

Proof of concept code can be found here: https://github.com/LivingInSyn/image_collsion_gen

At current this code will not work on very dark images due to the fact that brightening black just results in...more black. I'll spend some time trying to make it work in those situations though.

Conclusion

I'm not sure where exactly attacker generated collisions would have the greatest impact. It would potentially allow a malicious actor to post a picture that would cause other legitimate pictures to get deny-listed which would result in a denial of service type condition. It could also allow a malicious actor to post images that are supposed to be denied by making it collide with a picture on an allow list. Ultimately it is simply proof that you should not expect these hashes to be perfect or collision resistant.

That this was fun code to write. It nerd sniped me super hard for a few days and there are a lot of ways to go from here. Expanding this code to generate collisions among multiple hash types would be a logical next step, or to find ways to make the image less perceptibly modified. I'd love to hear your ideas if you have them :)

Top comments (0)