Aaron Stopher

Posted on Feb 9

Hashing at Speed - Learning Rust with a Challenge

#rust #algorithms #python #datascience

🚀 Understanding the Challenge

Preparing self-compiled image datasets—or cleaning up unorganized image dumps for training models—can be challenging. This is where perceptual hashing shines. With a range of perceptual image hashing algorithms available (each with its own pros and cons), for this library I'm going to focus on hashing methods to reduce visual duplicates as opposed to clustering.

During my MS, I used Nvidia's celeb faces research to train a ProGAN model for realistic face images. I then applied the model to a tougher, noisier case—cyberpunk cities—by scraping and compiling my own image set. With a deliberately low image count, I could test how various augmentation methods affected training. Still, duplicate images had to be minimized.

Starting with a manually collected ~3K image set, I reduced duplicates to yield roughly 2K unique images. I used the imagehash Python library, after trying a few algorithms, ended up choosing dhash for its performance and reasonable results.

Three years later, while browsing Kaggle image datasets, I noticed many of the sets still suffer from ambiguity and duplicates. With few well-pruned sets available, I set out to create imgdd.

🦀 Why Rust?

I had been searching for a good candidate project to learn rust. In 2012 I'd spent a good amount of time learning Java and C++ alongside core computer science concepts. It has now been some time since I've had much reason to develop with a compiled language; however it occurs to me, a lot of the productivity I've been able to enjoy for scientific applications in Python is accomplished using C and C variant bindings. Now seems like a great opportunity to combined my slightly "rusty" computer science skills with my newly acquired machine learning / data science knowledge. Ideally the result could help facilitate processing larger image sets.

📌 Initial Approach

I've released python packages with CICD to pypi before but I've not spent a lot of time creating full featured packaging around interop patterns. Since I'm planning on creating the core functionality in rust then exposing an interface for both python and rust while squeezing as much time complexity performance as I can. I will be deploying more heavy handed quality checks while keeping the actual tests relatively lightweight.

Code Quality Checks

unit testing - code coverage
integration testing - reliability
core function benchmarking - quality
integration benchmarking - quality
comparison benchmarking - quality

Time complexity being a primary focus, setting up criterion benchmarking is relatively high on the to do list. I found the setup incredibly straight forward. The framework is very friendly and gives feedback for slow tests suggesting a specific number to reduce the number of tests by. It seemed a bit reminiscent to me of running statistics tests in R.

The next focus in terms of strategy was to focus on matching hashing implementations with imagehash and other researched implementations for common perceptual hash methods. aHash, mHash, dHash are all relatively straightforward implementations. pHash and wHash are a bit more involved and require a bit more nuance in terms of implementation.

🔥 Developing the Algorithms

Overview

Creating the algorithm code was more a practice in optimizing time complexity of the function than anything. There is a lot of information regarding algorithm steps and the specific math required to accomplish the intended result.

Alongside verifying algorithm math/logic I checked the output against imagehash. I generated a red/black alternating pixel image and used the same cat image from this article to verify the hash; making sure imgdd's output was identical to imagehash's output.

Algorithm Details

aHash (Average Hash):
- Collects the average pixel value to generate a 64 bit binary hash value.

source

pub fn ahash(image: &DynamicImage) -> Result<Self> {
    let mut sum = 0u64;
    let mut pixels = [0u8; 64];

    // Collect pixel values and compute sum
    for (i, (_, _, pixel)) in image.pixels().enumerate().take(64) {
        pixels[i] = pixel[0];
        sum += pixels[i] as u64;
    }

    // Collect average pixel value
    let avg = sum / 64;

    // Compute hash and store bits in the correct order
    let mut hash = 0u64;
    for (i, &pixel) in pixels.iter().enumerate() {
        if pixel as u64 > avg {
            hash |= 1 << (63 - i); // reverse order
        }
    }

    Ok(Self { hash })
}

mHash (Median Hash):
- Collects median pixel value to generate a 64 bit binary hash value.

source

pub fn mhash(image: &DynamicImage) -> Result<Self> {
    let mut pixels = [0u8; 64];

    // Collect 64 pixel values
    for (i, pixel) in image.pixels().map(|p| p.2[0]).take(64).enumerate() {
        pixels[i] = pixel;
    }

    // Copy pixels so we don't modify the original array
    let mut pixels_copy = pixels;

    // Find median O(n)
    let mid = 32;
    let (low, median, _high) = pixels_copy.select_nth_unstable(mid);
    let median = (*median as u64 + low[mid - 1] as u64) / 2; // Compute true median

    // Compute hash
    let mut hash = 0u64;
    for (i, &pixel) in pixels.iter().enumerate() {
        if pixel as u64 > median {
            hash |= 1 << (63 - i); // reverse order
        }
    }

    Ok(Self { hash })
}

dHash (Difference Hash):
- Encodes relative changes between adjacent pixels in a 64 bit binary hash.

source

pub fn dhash(image: &DynamicImage) -> Result<Self> {
    let mut hash = 0u64;

    for y in 0..8 {
        let mut current = image.get_pixel(0, y)[0];
        for x in 1..9 {
            let next = image.get_pixel(x, y)[0];
            hash = (hash << 1) | (next > current) as u64;
            current = next;
        }
    }

    Ok(Self { hash })
}

pHash (Perceptual Hash):
- Overview:
  - pHash analyzes the frequency domain of an image using the Discrete Cosine Transform (DCT).
  - Start with a 32×32 grayscale image.
  - A two-dimensional DCT is applied—first row-wise, then column-wise—to convert the spatial pixel values into frequency coefficients.
  - Top-left 8×8 block of DCT coefficients is extracted. These coefficients capture the low-frequency content (the overall structure) of the image.
  - Median of resulting coefficients (excluding the DC component) is computed.
  - Each coefficient is compared against the median generating a 64 bit binary hash value.
- Algorithm Details:
  - DCT Transformation:
    - The DCT is applied in two passes (row- and column-wise) to obtain a frequency representation of the image.
    - The coefficient at position (0,0) (the DC component) is typically the largest since it represents the image’s average brightness.
  - DCT-II (Forward Transform):
    - For an input sequence $x_0, x_1, \dots, x_{N-1}$ the DCT-II is given by:
      
      $X_k = 2 \sum_{n=0}^{N-1} x_n \cos!\left(\frac{\pi}{N}\Bigl(n+\frac{1}{2}\Bigr) k\right), \quad \text{for } k = 0,1,\dots,N-1$
  - DCT-II (Inverse Transform):
    - To reconstruct the original sequence from the DCT-II coefficients $X_0, X_1, \dots, X_{N-1}$ , the inverse transform is:
      
      $x_n = \frac{1}{2N}\left(\frac{1}{2}X_0 + \sum_{k=1}^{N-1} X_k \cos!\left(\frac{\pi}{N}\, k\Bigl(n+\frac{1}{2}\Bigr)\right)\right), \quad \text{for } n = 0,1,\dots,N-1$

source

pub fn phash(image: &DynamicImage) -> Result<Self> {
    const IMG_SIZE: usize = 32;
    const HASH_SIZE: usize = 8;

    // Collect pixel values from normalized 32x32 grayscale image
    let mut pixels: Vec<f32> = image
        .pixels()
        .map(|p| p.2[0] as f32)
        .collect();

    // Plan DCT once for both rows and columns
    let mut planner = DctPlanner::new();
    let dct = planner.plan_dct2(IMG_SIZE);

    // Apply DCT row-wise in-place
    for row in pixels.chunks_exact_mut(IMG_SIZE) {
        dct.process_dct2(row);
    }

    // Apply DCT column-wise in-place
    for col in 0..IMG_SIZE {
        let mut col_values: [f32; IMG_SIZE] = [0.0; IMG_SIZE];

        for row in 0..IMG_SIZE {
            col_values[row] = pixels[row * IMG_SIZE + col];
        }

        dct.process_dct2(&mut col_values);

        for row in 0..IMG_SIZE {
            pixels[row * IMG_SIZE + col] = col_values[row];
        }
    }

    // Extract top-left 8x8 DCT coefficients (low frequencies)
    let mut dct_lowfreq = [0f32; HASH_SIZE * HASH_SIZE];
    for y in 0..HASH_SIZE {
        for x in 0..HASH_SIZE {
            dct_lowfreq[y * HASH_SIZE + x] = pixels[y * IMG_SIZE + x];
        }
    }

    // Compute median excluding DC coefficient
    let mut ac_coeffs = dct_lowfreq[1..].to_vec();
    let mid = ac_coeffs.len() / 2;
    ac_coeffs.select_nth_unstable_by(mid, |a, b| a.partial_cmp(b).unwrap());
    let median = ac_coeffs[mid];

    // Generate hash
    let mut hash = 0u64;
    for (i, &val) in dct_lowfreq.iter().enumerate() {
        if val > median {
            hash |= 1 << (63 - i);
        }
    }

    Ok(Self { hash })
}

wHash (Wavelet Hash):
- Overview:
  - Uses Haar wavelet transformations to capture image features.
  - Starts with an 8×8 grayscale image.
    
    The maximum decomposition level is computed as:
    $\text{ll\textunderscore max\textunderscore level} = \log_2(8) = 3$
    
    This means you can apply the Haar decomposition 3 times before the lowest-resolution (LL) sub-band becomes 1×1.
  - A 3-level forward Haar transform is performed to produce a vector of coefficients.
  - The four resulting subbands are:
    - LL: Low–Low (approximation; overall average information)
    - HL: High–Low (horizontal details)
    - LH: Low–High (vertical details)
    - HH: High–High (diagonal details)
  - The DC component (which captures the overall brightness) is then zeroed out before the inverse transform is applied.
  - The inverse Haar transform reconstructs the image (without the DC component), and the resulting pixel values are compared to their median to generate a 64-bit binary hash.
- Algorithm Details:
  - 2D Haar Transform
    - Forward:
      1. Row Transform:
        
        $\displaystyle a(i,j) = \frac{I(i,2j) + I(i,2j+1)}{\sqrt{2}}, \quad d(i,j) = \frac{I(i,2j) - I(i,2j+1)}{\sqrt{2}}$
      2. Column Transform:
        
        $\displaystyle LL(i,j) = \frac{a(2i,j) + a(2i+1,j)}{\sqrt{2}}, \quad HL(i,j) = \frac{a(2i,j) - a(2i+1,j)}{\sqrt{2}}$
        
        $\displaystyle LH(i,j) = \frac{d(2i,j) + d(2i+1,j)}{\sqrt{2}}, \quad HH(i,j) = \frac{d(2i,j) - d(2i+1,j)}{\sqrt{2}}$
    - Inverse:
      1. Inverse Column Transform:
        
        $\displaystyle a(2i,j) = \frac{LL(i,j) + HL(i,j)}{\sqrt{2}}, \quad a(2i+1,j) = \frac{LL(i,j) - HL(i,j)}{\sqrt{2}}$
        
        $\displaystyle d(2i,j) = \frac{LH(i,j) + HH(i,j)}{\sqrt{2}}, \quad d(2i+1,j) = \frac{LH(i,j) - HH(i,j)}{\sqrt{2}}$
      2. Inverse Row Transform:
        
        $\displaystyle I(i,2j) = \frac{a(i,j) + d(i,j)}{\sqrt{2}}, \quad I(i,2j+1) = \frac{a(i,j) - d(i,j)}{\sqrt{2}}$
  - DC Coefficient:
    - For an 8×8 image with pixel values $p_{ij}$ , the DC coefficient looks roughly as follows:
      
      ${DC} \propto \frac{1}{64}\sum_{i=0}^{7}\sum_{j=0}^{7} p_{ij}$
    - This coefficient is usually much larger than the other (AC) coefficients that capture edges and textures.
  - Why Zero It Out?
    - Robustness to Illumination: Since the DC component reflects the overall brightness, removing it makes the hash less sensitive to global illumination differences.
    - Preventing Median Skew: Because the DC value is relatively large, including it in the median calculation could skew the threshold. Zeroing it ensures that the hash focuses on structural details rather than overall brightness.

source

pub fn whash(image: &DynamicImage) -> Result<Self> {
    const HASH_SIZE: u32 = 8;
    let ll_max_level: usize = 3;

    // Allocate flat vector of normalized pixels (row–major order).
    let total_pixels = (HASH_SIZE * HASH_SIZE) as usize;
    let mut pixels = Vec::with_capacity(total_pixels);
    for y in 0..HASH_SIZE {
        for x in 0..HASH_SIZE {
            let pixel = image.get_pixel(x, y);
            pixels.push(pixel[0] as f32 / 255.0);
        }
    }

    // ---------- Remove low-level frequency (DC) component ---------- //
    // Perform a full forward Haar transform - 8×8 image (3 levels).
    pixels.transform(Operation::Forward, &Haar::new(), ll_max_level);

    // Zero out the DC coefficient.
    pixels[0] = 0.0;

    // Perform inverse Haar transform (reconstruct image).
    pixels.transform(Operation::Inverse, &Haar::new(), ll_max_level);

    // ---------- Compute median O(n) ---------- //
    let mid: usize = 32;
    // Clone flat pixel vector.
    let mut flat = pixels.clone();
    // Quicksort vector.
    flat.select_nth_unstable_by(mid, |a, b| a.partial_cmp(b).unwrap());
    // Compute median.
    let median = (flat[mid - 1] + flat[mid]) / 2.0;

    // Generate hash.
    let mut hash = 0u64;
    for (i, &val) in pixels.iter().enumerate() {
        if val > median {
            hash |= 1 << (63 - i);
        }
    }

    Ok(Self { hash })
}

✅ Lesson Learned

Running tests consistently will save you a lot of heartache as an engineer; even though I was just updating my docs, I ran the typical battery of tests only to notice about a 60% regression in my benchmarks. When I built MkDocs my signature didn't include the intended defaults since I was unwrapping the Nones later in the function as a result of developing the imgddcore crate and interface crate before the imgddpy crate. However, swapping the signature had a much larger impact than I expected.

source

#[pyfunction(signature = (path, filter = "triangle", algo = "dhash", sort = false))]
#[pyfunction(signature = (path, filter = None, algo = None, sort = false))]

❌ The Impact

After making this change, benchmarks showed a significant performance drop. In my early comparison script, the results changed from:

Benchmark Results (in seconds per image):

Metric      imgdd       imagehash   
------------------------------------
min_time    0.002266    0.006736    
max_time    0.003773    0.010739    
avg_time    0.002766    0.008363    
median_time 0.002671    0.008147

to:

Benchmark Results (in seconds per image):

Metric      imgdd       imagehash   
------------------------------------
min_time    0.006906    0.006850    
max_time    0.022984    0.009187    
avg_time    0.010833    0.007882    
median_time 0.008730    0.007837

This represents roughly a 66% performance regression.

In the end, I reverted to using None for the default parameters in the PyO3 signature to ensure optimal performance. For the docs I removed the signatures from the autodocs and provided the intended signature in the pyo3 function docstring.

source

plugins:
  - include-markdown
  - mkdocstrings:
      handlers:
        python:
          options:
            show_signature: false # remove provided function signature

Takeaways:
- The PyO3 function signature conversion is less efficient than handling it within the function.
- Deferring conversion with unwrap at the point of use is significantly faster than embedding defaults directly in the signature.

🎉 The Outcome

After a good amount of early refactoring the results have been extremely encouraging. Not only did I manage to boost performance significantly—imgdd now consistently outperforms imagehash by roughly 60–95%—but I've also gotten to experience first hand how friendly Rust can be. After spending the last few years practicing functional programming, I appreciated Rust's blend of low-level control with pseudo functional programming paradigms. Here are some of my key takeaways:

Rusts publishing tools (dry-run flag, simple Cargo.toml configuration) are intuitive.
Rust’s built-in formatter rearranges imports in alphabetical order which satisfies my ocd.
Discovered Criterion as one of my favorite benchmarking libraries.
I'm sure I've broken a cardinal Rust rule somewhere along the way, however the overall development experience with Rust / Rust bindings was very positive.

💡 Stuff I Learned

Enhanced Release & Packaging Strategies:
- More familiar with structure of multi-crate release patterns (thanks polars!)
- More familiar with abi compatibility and how to build multi-system packages and release in a matrix for pypi package upload.
Efficient Interoperability:
- More familiar with developing efficient PyO3 bindings.
- Deepened my understanding of efficient interop data handling.
Algorithmic Mastery:
- Implemented both the Haar Wavelet Transform and the Discrete Cosine Transform for image vectors.
- Reinforced effective algorithmic implementation patterns.
- Learned reliable low-level optimization patterns.
Robust Code Quality
- Experience implementing code coverage, benchmarking, and integration testing across an interop multi interface library

In short, this project not only resulted in a high-performance image hashing library but also served as a comprehensive learning project for getting my hands Rusty.

📌 References

Miscellaneous

Perceptual Hashing

Discrete Cosine Transform

Haar Wavelet Transform

Top comments (1)

Andrey Rusev • Feb 21

Great! :) Definitely looking forward to your next post! :)