PRASANNA G

Posted on Nov 5

How I Built a 95% Accurate Defect Detection System with an ESP32-CAM and Python

#python #iot #computervision #embedded

I wanted to build a system that was:

Low-cost: Using the super-cheap ESP32-CAM module.

Scalable: Easy to adapt to new products.

Real-time: Fast enough for a production line.

Most people would jump straight to a complex deep learning model like YOLO or a big CNN. But that comes with its own set of problems:

Data Hunger: You need thousands of labeled defect images to train the model.

"Black Box" Problem: When it fails, it's hard to know why.

Heavy Hardware: These models often need a GPU to run in real-time, which kills the "low-cost" goal.

So, I tried a different approach. Instead of teaching a complex AI what a "defect" looks like, I decided to just teach my system what a perfect product looks like.

My secret weapon? A classical computer vision technique called the Structural Similarity Index (SSIM).

💡 The Tech Stack
The architecture is a simple client-server model:

Client: An ESP32-CAM module. Its only job is to capture images of the product (in my case, a washing machine outer component
Manual quality control is a pain. It's slow, expensive, and prone to human error. While high-end machine vision systems exist, they are often too expensive for small shops or hobbyist projects.

Server: A Python Flask server. This is the "brain." It runs on a PC and does all the heavy lifting.

The "Brain": The OpenCV library. It receives images, cleans them up, and runs the SSIM comparison.

Here’s the basic flow:

]

⚙️ How It Works: The SSIM Workflow
The core idea is simple: I store a "golden" (defect-free) reference image on the server. Then, for every new image that comes from the ESP32-CAM, the server compares it to that golden image.

Here's the step-by-step process:

Capture and Transmit
The ESP32-CAM is set to capture 640x480 resolution frames. It sends each frame to a dedicated endpoint on my Flask server using a simple HTTP POST request.
Pre-processing (The Most Important Step!)
A raw image from a factory floor is messy. Lighting changes, and the camera or part might be slightly misaligned. Before I can do any comparison, I have to "normalize" the image.

Geometric Alignment: First, the server aligns the incoming image with the golden template to correct for any minor shifts or rotations.

Histogram Equalization: This normalizes the image's brightness and contrast, making the system robust against flickering or changing ambient light.

Gaussian Blur: A minor blur is applied to remove high-frequency sensor noise that could otherwise be flagged as a defect.

Analysis with SSIM SSIM is a metric that measures the similarity between two images based on how a human would perceive them (it looks at structure, luminance, and contrast). It returns a score between -1 and 1, where 1 means the images are identical.

Instead of one global comparison, I use a sliding window to create an SSIM "heat map". Any region with a significant structural difference (like a scratch or crack) will produce a very low SSIM score, instantly highlighting the defect.

The Decision The system checks the SSIM scores in key Regions of Interest (ROIs), like the hinge, seal, and glass. If the score in any ROI drops below a calibrated threshold (e.g., < 0.85), the server outputs a "Defective" decision.

📊 The Results: It Actually Works!
I tested this system on a public dataset of 1,000 images (a 50/50 split of good and defective parts).

Accuracy: 95% overall accuracy.

Speed: 140 ms average end-to-end latency per frame (from capture to decision) on a standard quad-core CPU. That's fast enough for real-time inspection.

Robustness: The SSIM method was more robust to lighting changes than a traditional ML (HOG+SVM) approach.

Here’s a look at the system in action:

] ] ]

When I compared my 140ms SSIM approach to a more "traditional" machine learning pipeline (using HOG features + an SVM classifier), my system was 33% faster (140ms vs. 210ms) and achieved the same accuracy (95% vs 94%) .

✅ Why This Approach is Awesome
No Model Training: This is the biggest win. I don't need a massive, labeled dataset. I just need a small set of "golden" reference images.

It's a "White Box": Unlike a neural network, this system is 100% explainable. If a defect is flagged, I can output the SSIM heat map and see exactly which pixels caused the failure.

Super Flexible: Need to inspect a new product model? Just drop a new "golden" image into the server's template library. No retraining, no new code.

🚀 What's Next?
This SSIM-based system proved to be a simple, robust, and practical alternative to complex deep learning models for this use case.

The next steps are to make it even more robust:

Dynamic Template Management: Automatically update the "golden" reference image to account for gradual wear and tear.

Hybrid Approach: Use a lightweight neural network to find potential defects and then use SSIM to confirm them.

Smart Factory Integration: Integrate with IoT protocols like MQTT to send alerts directly to a dashboard or stop the production line.

Thanks for reading! Let me know what you think in the comments.

DEV Community

How I Built a 95% Accurate Defect Detection System with an ESP32-CAM and Python

Top comments (0)