🚀 Understanding the Challenge
Preparing self-compiled image datasets—or cleaning up unorganized image dumps for training models—can be challenging. This is where perceptual hashing shines. With a range of perceptual image hashing algorithms available (each with its own pros and cons), for this library I'm going to focus on hashing methods to reduce visual duplicates as opposed to clustering.
During my MS, I used Nvidia's celeb faces research to train a ProGAN model for realistic face images. I then applied the model to a tougher, noisier case—cyberpunk cities—by scraping and compiling my own image set. With a deliberately low image count, I could test how various augmentation methods affected training. Still, duplicate images had to be minimized.
Starting with a manually collected ~3K image set, I reduced duplicates to yield roughly 2K unique images. I used the imagehash Python library, after trying a few algorithms, ended up choosing dhash for its performance and reasonable results.
Three years later, while browsing Kaggle image datasets, I noticed many of the sets still suffer from ambiguity and duplicates. With few well-pruned sets available, I set out to create imgdd.
🦀 Why Rust?
I had been searching for a good candidate project to learn rust. In 2012 I'd spent a good amount of time learning Java and C++ alongside core computer science concepts. It has now been some time since I've had much reason to develop with a compiled language; however it occurs to me, a lot of the productivity I've been able to enjoy for scientific applications in Python is accomplished using C and C variant bindings. Now seems like a great opportunity to combined my slightly "rusty" computer science skills with my newly acquired machine learning / data science knowledge. Ideally the result could help facilitate processing larger image sets.
📌 Initial Approach
I've released python packages with CICD to pypi before but I've not spent a lot of time creating full featured packaging around interop patterns. Since I'm planning on creating the core functionality in rust then exposing an interface for both python and rust while squeezing as much time complexity performance as I can. I will be deploying more heavy handed quality checks while keeping the actual tests relatively lightweight.
Code Quality Checks
- unit testing - code coverage
- integration testing - reliability
- core function benchmarking - quality
- integration benchmarking - quality
- comparison benchmarking - quality
Time complexity being a primary focus, setting up criterion benchmarking is relatively high on the to do list. I found the setup incredibly straight forward. The framework is very friendly and gives feedback for slow tests suggesting a specific number to reduce the number of tests by. It seemed a bit reminiscent to me of running statistics tests in R.
The next focus in terms of strategy was to focus on matching hashing implementations with imagehash and other researched implementations for common perceptual hash methods. aHash, mHash, dHash are all relatively straightforward implementations. pHash and wHash are a bit more involved and require a bit more nuance in terms of implementation.
🔥 Developing the Algorithms
Overview
Creating the algorithm code was more a practice in optimizing time complexity of the function than anything. There is a lot of information regarding algorithm steps and the specific math required to accomplish the intended result.
Alongside verifying algorithm math/logic I checked the output against imagehash. I generated a red/black alternating pixel image and used the same cat image from this article to verify the hash; making sure imgdd's output was identical to imagehash's output.
Algorithm Details
-
aHash (Average Hash):
- Collects the average pixel value to generate a 64 bit binary hash value.
pub fn ahash(image: &DynamicImage) -> Result<Self> {
let mut sum = 0u64;
let mut pixels = [0u8; 64];
// Collect pixel values and compute sum
for (i, (_, _, pixel)) in image.pixels().enumerate().take(64) {
pixels[i] = pixel[0];
sum += pixels[i] as u64;
}
// Collect average pixel value
let avg = sum / 64;
// Compute hash and store bits in the correct order
let mut hash = 0u64;
for (i, &pixel) in pixels.iter().enumerate() {
if pixel as u64 > avg {
hash |= 1 << (63 - i); // reverse order
}
}
Ok(Self { hash })
}
-
mHash (Median Hash):
- Collects median pixel value to generate a 64 bit binary hash value.
pub fn mhash(image: &DynamicImage) -> Result<Self> {
let mut pixels = [0u8; 64];
// Collect 64 pixel values
for (i, pixel) in image.pixels().map(|p| p.2[0]).take(64).enumerate() {
pixels[i] = pixel;
}
// Copy pixels so we don't modify the original array
let mut pixels_copy = pixels;
// Find median O(n)
let mid = 32;
let (low, median, _high) = pixels_copy.select_nth_unstable(mid);
let median = (*median as u64 + low[mid - 1] as u64) / 2; // Compute true median
// Compute hash
let mut hash = 0u64;
for (i, &pixel) in pixels.iter().enumerate() {
if pixel as u64 > median {
hash |= 1 << (63 - i); // reverse order
}
}
Ok(Self { hash })
}
-
dHash (Difference Hash):
- Encodes relative changes between adjacent pixels in a 64 bit binary hash.
pub fn dhash(image: &DynamicImage) -> Result<Self> {
let mut hash = 0u64;
for y in 0..8 {
let mut current = image.get_pixel(0, y)[0];
for x in 1..9 {
let next = image.get_pixel(x, y)[0];
hash = (hash << 1) | (next > current) as u64;
current = next;
}
}
Ok(Self { hash })
}
-
pHash (Perceptual Hash):
-
Overview:
- pHash analyzes the frequency domain of an image using the Discrete Cosine Transform (DCT).
- Start with a 32×32 grayscale image.
- A two-dimensional DCT is applied—first row-wise, then column-wise—to convert the spatial pixel values into frequency coefficients.
- Top-left 8×8 block of DCT coefficients is extracted. These coefficients capture the low-frequency content (the overall structure) of the image.
- Median of resulting coefficients (excluding the DC component) is computed.
- Each coefficient is compared against the median generating a 64 bit binary hash value.
-
Algorithm Details:
-
DCT Transformation:
- The DCT is applied in two passes (row- and column-wise) to obtain a frequency representation of the image.
- The coefficient at position (0,0) (the DC component) is typically the largest since it represents the image’s average brightness.
-
DCT-II (Forward Transform):
-
For an input sequence the DCT-II is given by:
-
-
DCT-II (Inverse Transform):
-
To reconstruct the original sequence from the DCT-II coefficients , the inverse transform is:
-
-
DCT Transformation:
-
Overview:
pub fn phash(image: &DynamicImage) -> Result<Self> {
const IMG_SIZE: usize = 32;
const HASH_SIZE: usize = 8;
// Collect pixel values from normalized 32x32 grayscale image
let mut pixels: Vec<f32> = image
.pixels()
.map(|p| p.2[0] as f32)
.collect();
// Plan DCT once for both rows and columns
let mut planner = DctPlanner::new();
let dct = planner.plan_dct2(IMG_SIZE);
// Apply DCT row-wise in-place
for row in pixels.chunks_exact_mut(IMG_SIZE) {
dct.process_dct2(row);
}
// Apply DCT column-wise in-place
for col in 0..IMG_SIZE {
let mut col_values: [f32; IMG_SIZE] = [0.0; IMG_SIZE];
for row in 0..IMG_SIZE {
col_values[row] = pixels[row * IMG_SIZE + col];
}
dct.process_dct2(&mut col_values);
for row in 0..IMG_SIZE {
pixels[row * IMG_SIZE + col] = col_values[row];
}
}
// Extract top-left 8x8 DCT coefficients (low frequencies)
let mut dct_lowfreq = [0f32; HASH_SIZE * HASH_SIZE];
for y in 0..HASH_SIZE {
for x in 0..HASH_SIZE {
dct_lowfreq[y * HASH_SIZE + x] = pixels[y * IMG_SIZE + x];
}
}
// Compute median excluding DC coefficient
let mut ac_coeffs = dct_lowfreq[1..].to_vec();
let mid = ac_coeffs.len() / 2;
ac_coeffs.select_nth_unstable_by(mid, |a, b| a.partial_cmp(b).unwrap());
let median = ac_coeffs[mid];
// Generate hash
let mut hash = 0u64;
for (i, &val) in dct_lowfreq.iter().enumerate() {
if val > median {
hash |= 1 << (63 - i);
}
}
Ok(Self { hash })
}
-
wHash (Wavelet Hash):
-
Overview:
- Uses Haar wavelet transformations to capture image features.
-
Starts with an 8×8 grayscale image.
The maximum decomposition level is computed as:
This means you can apply the Haar decomposition 3 times before the lowest-resolution (LL) sub-band becomes 1×1.
A 3-level forward Haar transform is performed to produce a vector of coefficients.
-
The four resulting subbands are:
- LL: Low–Low (approximation; overall average information)
- HL: High–Low (horizontal details)
- LH: Low–High (vertical details)
- HH: High–High (diagonal details)
The DC component (which captures the overall brightness) is then zeroed out before the inverse transform is applied.
The inverse Haar transform reconstructs the image (without the DC component), and the resulting pixel values are compared to their median to generate a 64-bit binary hash.
-
Algorithm Details:
-
2D Haar Transform
-
Forward:
-
Row Transform:
-
Column Transform:
-
-
Inverse:
-
Inverse Column Transform:
-
Inverse Row Transform:
-
-
-
DC Coefficient:
-
For an 8×8 image with pixel values , the DC coefficient looks roughly as follows:
This coefficient is usually much larger than the other (AC) coefficients that capture edges and textures.
-
-
Why Zero It Out?
- Robustness to Illumination: Since the DC component reflects the overall brightness, removing it makes the hash less sensitive to global illumination differences.
- Preventing Median Skew: Because the DC value is relatively large, including it in the median calculation could skew the threshold. Zeroing it ensures that the hash focuses on structural details rather than overall brightness.
-
-
pub fn whash(image: &DynamicImage) -> Result<Self> {
const HASH_SIZE: u32 = 8;
let ll_max_level: usize = 3;
// Allocate flat vector of normalized pixels (row–major order).
let total_pixels = (HASH_SIZE * HASH_SIZE) as usize;
let mut pixels = Vec::with_capacity(total_pixels);
for y in 0..HASH_SIZE {
for x in 0..HASH_SIZE {
let pixel = image.get_pixel(x, y);
pixels.push(pixel[0] as f32 / 255.0);
}
}
// ---------- Remove low-level frequency (DC) component ---------- //
// Perform a full forward Haar transform - 8×8 image (3 levels).
pixels.transform(Operation::Forward, &Haar::new(), ll_max_level);
// Zero out the DC coefficient.
pixels[0] = 0.0;
// Perform inverse Haar transform (reconstruct image).
pixels.transform(Operation::Inverse, &Haar::new(), ll_max_level);
// ---------- Compute median O(n) ---------- //
let mid: usize = 32;
// Clone flat pixel vector.
let mut flat = pixels.clone();
// Quicksort vector.
flat.select_nth_unstable_by(mid, |a, b| a.partial_cmp(b).unwrap());
// Compute median.
let median = (flat[mid - 1] + flat[mid]) / 2.0;
// Generate hash.
let mut hash = 0u64;
for (i, &val) in pixels.iter().enumerate() {
if val > median {
hash |= 1 << (63 - i);
}
}
Ok(Self { hash })
}
✅ Lesson Learned
Running tests consistently will save you a lot of heartache as an engineer; even though I was just updating my docs, I ran the typical battery of tests only to notice about a 60% regression in my benchmarks. When I built MkDocs my signature didn't include the intended defaults since I was unwrapping the None
s later in the function as a result of developing the imgddcore
crate and interface crate before the imgddpy
crate. However, swapping the signature had a much larger impact than I expected.
#[pyfunction(signature = (path, filter = "triangle", algo = "dhash", sort = false))]
#[pyfunction(signature = (path, filter = None, algo = None, sort = false))]
❌ The Impact
After making this change, benchmarks showed a significant performance drop. In my early comparison script, the results changed from:
Benchmark Results (in seconds per image):
Metric imgdd imagehash
------------------------------------
min_time 0.002266 0.006736
max_time 0.003773 0.010739
avg_time 0.002766 0.008363
median_time 0.002671 0.008147
to:
Benchmark Results (in seconds per image):
Metric imgdd imagehash
------------------------------------
min_time 0.006906 0.006850
max_time 0.022984 0.009187
avg_time 0.010833 0.007882
median_time 0.008730 0.007837
This represents roughly a 66% performance regression.
In the end, I reverted to using None
for the default parameters in the PyO3 signature to ensure optimal performance. For the docs I removed the signatures from the autodocs and provided the intended signature in the pyo3 function docstring.
plugins:
- include-markdown
- mkdocstrings:
handlers:
python:
options:
show_signature: false # remove provided function signature
-
Takeaways:
- The PyO3 function signature conversion is less efficient than handling it within the function.
- Deferring conversion with
unwrap
at the point of use is significantly faster than embedding defaults directly in the signature.
🎉 The Outcome
After a good amount of early refactoring the results have been extremely encouraging. Not only did I manage to boost performance significantly—imgdd now consistently outperforms imagehash by roughly 60–95%—but I've also gotten to experience first hand how friendly Rust can be. After spending the last few years practicing functional programming, I appreciated Rust's blend of low-level control with pseudo functional programming paradigms. Here are some of my key takeaways:
- Rusts publishing tools (dry-run flag, simple Cargo.toml configuration) are intuitive.
- Rust’s built-in formatter rearranges imports in alphabetical order which satisfies my ocd.
- Discovered Criterion as one of my favorite benchmarking libraries.
- I'm sure I've broken a cardinal Rust rule somewhere along the way, however the overall development experience with Rust / Rust bindings was very positive.
💡 Stuff I Learned
-
Enhanced Release & Packaging Strategies:
- More familiar with structure of multi-crate release patterns (thanks polars!)
- More familiar with abi compatibility and how to build multi-system packages and release in a matrix for pypi package upload.
-
Efficient Interoperability:
- More familiar with developing efficient PyO3 bindings.
- Deepened my understanding of efficient interop data handling.
-
Algorithmic Mastery:
- Implemented both the Haar Wavelet Transform and the Discrete Cosine Transform for image vectors.
- Reinforced effective algorithmic implementation patterns.
- Learned reliable low-level optimization patterns.
-
Robust Code Quality
- Experience implementing code coverage, benchmarking, and integration testing across an interop multi interface library
In short, this project not only resulted in a high-performance image hashing library but also served as a comprehensive learning project for getting my hands Rusty.
📌 References
Miscellaneous
Perceptual Hashing
- imagehash on GitHub
- Testing Different Image Hash Functions
- Perceptual Image Hashing Article (ScienceDirect)
- Evaluating Perceptual Image Hashes at OkCupid
- CEUR Workshop Proceedings on Perceptual Image Hashing
Discrete Cosine Transform
- Rust DCT Documentation
- Discrete Cosine Transform – CCRMA, Stanford
- Lecture Slides on DCT (Grainger, Illinois)
- Cosine Transform – MTU Teaching Material
Top comments (1)
Great! :) Definitely looking forward to your next post! :)