DEV Community

Maksymilian
Maksymilian

Posted on

Scalable Image Comparison: An Open-Source Java Library

I’ve been working on an open-source Java library designed for scalable, multi-stage image comparison. It allows you to mix and match strategies (like CRC32 checksum and perceptual hashing) to de-duplicate massive collections efficiently.

The core design is modular, so you can implement your own strategies for both grouping and comparison. For example:

  • Combine CRC32Grouper + PHash + PixelByPixel to identify duplicates.
  • Use some kind of meta data Grouper + PerceptualHash to identify similar images.

I’d love to hear your feedback:

  • Does this approach make sense for large-scale scenarios?
  • What could I improve to make it more extensible?

Here’s the repository: LINK.

If you have ideas for new features or want to contribute, feel free to open an issue or submit a PR. Any thoughts appreciated!

Top comments (0)