I wanted to build a system that was:
Low-cost: Using the super-cheap ESP32-CAM module.
Scalable: Easy to adapt to new products.
Real-time: Fast enough for a production line.
Most people would jump straight to a complex deep learning model like YOLO or a big CNN. But that comes with its own set of problems:
Data Hunger: You need thousands of labeled defect images to train the model.
"Black Box" Problem: When it fails, it's hard to know why.
Heavy Hardware: These models often need a GPU to run in real-time, which kills the "low-cost" goal.
So, I tried a different approach. Instead of teaching a complex AI what a "defect" looks like, I decided to just teach my system what a perfect product looks like.
My secret weapon? A classical computer vision technique called the Structural Similarity Index (SSIM).
💡 The Tech Stack
The architecture is a simple client-server model:
Client: An ESP32-CAM module. Its only job is to capture images of the product (in my case, a washing machine outer component
Manual quality control is a pain. It's slow, expensive, and prone to human error. While high-end machine vision systems exist, they are often too expensive for small shops or hobbyist projects.
Server: A Python Flask server. This is the "brain." It runs on a PC and does all the heavy lifting.
The "Brain": The OpenCV library. It receives images, cleans them up, and runs the SSIM comparison.
Here’s the basic flow:
]
⚙️ How It Works: The SSIM Workflow
The core idea is simple: I store a "golden" (defect-free) reference image on the server. Then, for every new image that comes from the ESP32-CAM, the server compares it to that golden image.
Here's the step-by-step process:
Capture and Transmit
The ESP32-CAM is set to capture 640x480 resolution frames. It sends each frame to a dedicated endpoint on my Flask server using a simple HTTP POST request.Pre-processing (The Most Important Step!)
A raw image from a factory floor is messy. Lighting changes, and the camera or part might be slightly misaligned. Before I can do any comparison, I have to "normalize" the image.
Geometric Alignment: First, the server aligns the incoming image with the golden template to correct for any minor shifts or rotations.
Histogram Equalization: This normalizes the image's brightness and contrast, making the system robust against flickering or changing ambient light.
Gaussian Blur: A minor blur is applied to remove high-frequency sensor noise that could otherwise be flagged as a defect.
- Analysis with SSIM SSIM is a metric that measures the similarity between two images based on how a human would perceive them (it looks at structure, luminance, and contrast). It returns a score between -1 and 1, where 1 means the images are identical.
Instead of one global comparison, I use a sliding window to create an SSIM "heat map". Any region with a significant structural difference (like a scratch or crack) will produce a very low SSIM score, instantly highlighting the defect.
- The Decision The system checks the SSIM scores in key Regions of Interest (ROIs), like the hinge, seal, and glass. If the score in any ROI drops below a calibrated threshold (e.g., < 0.85), the server outputs a "Defective" decision.
📊 The Results: It Actually Works!
I tested this system on a public dataset of 1,000 images (a 50/50 split of good and defective parts).
Accuracy: 95% overall accuracy.
Speed: 140 ms average end-to-end latency per frame (from capture to decision) on a standard quad-core CPU. That's fast enough for real-time inspection.
Robustness: The SSIM method was more robust to lighting changes than a traditional ML (HOG+SVM) approach.
Here’s a look at the system in action:
] ] ]
When I compared my 140ms SSIM approach to a more "traditional" machine learning pipeline (using HOG features + an SVM classifier), my system was 33% faster (140ms vs. 210ms) and achieved the same accuracy (95% vs 94%) .
✅ Why This Approach is Awesome
No Model Training: This is the biggest win. I don't need a massive, labeled dataset. I just need a small set of "golden" reference images.
It's a "White Box": Unlike a neural network, this system is 100% explainable. If a defect is flagged, I can output the SSIM heat map and see exactly which pixels caused the failure.
Super Flexible: Need to inspect a new product model? Just drop a new "golden" image into the server's template library. No retraining, no new code.
🚀 What's Next?
This SSIM-based system proved to be a simple, robust, and practical alternative to complex deep learning models for this use case.
The next steps are to make it even more robust:
Dynamic Template Management: Automatically update the "golden" reference image to account for gradual wear and tear.
Hybrid Approach: Use a lightweight neural network to find potential defects and then use SSIM to confirm them.
Smart Factory Integration: Integrate with IoT protocols like MQTT to send alerts directly to a dashboard or stop the production line.
Thanks for reading! Let me know what you think in the comments.
Top comments (0)