As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!
I remember the first time I tried to teach a computer to see. I had a folder of blurry photos from a family vacation and wanted to write a program that could automatically find all the pictures containing a sunset. I thought it would be simple — just look for orange pixels. But the images were taken at different times of day, with different cameras, and some were rotated. That’s when I realized that image processing is not about magic; it’s about building a careful, step-by-step pipeline that turns raw pixels into useful information. Over the years, I’ve learned eight core techniques using Python libraries like OpenCV and Pillow that handle almost any image problem you’ll face. I want to share them with you the way I wish someone had explained them to me: simply, with lots of code, and without unnecessary jargon.
Let’s start at the very beginning: reading an image from disk. Both OpenCV and Pillow can do this, but they have a small difference that tripped me up more than once. OpenCV loads images in BGR order — blue, green, red — while Pillow uses standard RGB. If you display an OpenCV image without conversion, everything looks blueish and wrong. I once spent an hour debugging why my processed images looked like they belonged under a blacklight. The fix is a single line: cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB).
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
def open_image(path):
# OpenCV way
img = cv2.imread(path)
if img is None:
print("Could not load image. Check the file path.")
return None, None
# Convert to RGB for proper display
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# Pillow way
pil_img = Image.open(path)
return img_rgb, pil_img
Saving is equally straightforward. I always prefer OpenCV for saving because it handles compression parameters more explicitly. For example, when saving JPEGs, you can set quality:
cv2.imwrite('output.jpg', img, [cv2.IMWRITE_JPEG_QUALITY, 95])
Now, why would you need two libraries? In my work, I use Pillow for quick thumbnail generation and format conversion, and OpenCV for everything else — especially when I need to work with arrays and apply mathematical operations. Pillow’s Image object is friendly for simple edits like resize with Image.ANTIALIAS, but OpenCV gives you direct access to numpy arrays, which is essential for the techniques we’re about to explore.
The second technique is color space conversion. Think of it as translating an image into different languages where certain tasks become easier. RGB is good for display, but if you want to detect a red stop sign under changing sunlight, you’re better off using HSV (Hue, Saturation, Value). Hue represents the color, saturation the intensity, and value the brightness. I once built a system to track a green tennis ball; by converting the frame to HSV and thresholding on hue, I could ignore shadows and reflections that would have ruined a simple threshold on the red channel.
def find_green_ball(image):
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
# Green hue range in OpenCV is 50-70 (scale 0-180)
lower = np.array([50, 100, 100])
upper = np.array([70, 255, 255])
mask = cv2.inRange(hsv, lower, upper)
return mask
Another useful color space is LAB, where L stands for lightness, and A and B represent color-opponent dimensions. If you need to adjust contrast without messing up colors, separate the L channel, apply equalization, and merge back. I used this when enhancing old photos for a friend’s archive. The difference was striking — details emerged from shadows without turning people into aliens.
Geometric transformations are my third technique, and they’re the workhorses of any pipeline that needs to align, resize, or correct perspective. I once had to process scans of handwritten notes that were slightly rotated and stretched because the pages were bound in a book. I used OpenCV’s getPerspectiveTransform to map the four corners of the page to a rectangle. It felt like magic the first time the text straightened out.
def deskew_page(image):
h, w = image.shape[:2]
# Assume you have four corner points of the page
src_points = np.float32([[50, 100], [w-100, 80], [30, h-50], [w-60, h-30]])
dst_points = np.float32([[0,0], [w,0], [0,h], [w,h]])
M = cv2.getPerspectiveTransform(src_points, dst_points)
corrected = cv2.warpPerspective(image, M, (w, h))
return corrected
Resizing and rotation are simpler. I always specify the interpolation method because a thumbnail needs INTER_AREA, while enlarging needs INTER_LINEAR or INTER_CUBIC. Remember that rotating without cropping can lose content — unless you expand the canvas. I wrote a helper that computes the new bounding box for any angle.
Filtering, the fourth technique, is about removing noise and sharpening features. The first time I applied a Gaussian blur to an image, I couldn’t believe how much random grain disappeared. It’s like smearing a foggy window with your finger — you lose some detail but gain clarity of the big picture. For medical images, I use median blur because it kills salt-and-pepper noise without blurring edges as much.
def denoise(image, method='gaussian'):
if method == 'gaussian':
return cv2.GaussianBlur(image, (5,5), 1)
elif method == 'median':
return cv2.medianBlur(image, 5)
elif method == 'bilateral':
return cv2.bilateralFilter(image, 9, 75, 75)
Edge detection is where filtering really shines. Canny edge detection is my go-to for finding outlines. You set two thresholds: low and high. Edges with gradient above high are strong, below low are discarded, and in between are kept only if connected to a strong edge. I tweak these thresholds manually depending on the image, but 50 and 150 work as a starting point for most natural scenes.
The fifth technique is morphological operations — a fancy name for expanding or shrinking dark and light regions. Think of them as eroding or dilating structures in a binary mask. I used this to clean up a mask of bacterial colonies in a petri dish. Small specks (noise) disappeared after an opening (erosion then dilation), and holes inside colonies were filled with a closing (dilation then erosion).
def clean_mask(mask):
kernel = np.ones((5,5), np.uint8)
opening = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel)
closing = cv2.morphologyEx(opening, cv2.MORPH_CLOSE, kernel)
return closing
Once you have a clean binary image, contour detection — technique number six — lets you find connected components. I built a simple object counter for a factory assembly line: threshold the image to separate objects from the background, find contours, and filter by area. You can also approximate each contour to find how many vertices it has — perfect for identifying squares or triangles.
def count_objects(binary):
contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
count = 0
for cnt in contours:
area = cv2.contourArea(cnt)
if area > 100: # ignore tiny noise
count += 1
return count
I once had a folder of scanned receipts. I used contour detection to isolate each receipt from the background, then applied perspective correction on each one individually. The pipeline saved hours of manual work.
Template matching is technique seven. It’s useful when you have a known pattern and want to find all its occurrences in a larger image. I used it to locate company logos in a web capture. OpenCV slides a small template across the large image and computes a correlation score at each position. The highest score marks the best match.
def find_logo(scene, template):
result = cv2.matchTemplate(scene, template, cv2.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
if max_val > 0.8:
return max_loc # top-left corner
return None
Be careful: template matching is not rotation or scale invariant. If the logo appears rotated, you’d need to try multiple orientations. That’s when you might move to feature matching (SIFT, ORB) or, even better, deep learning.
Which brings me to the eighth technique: integrating deep learning models with OpenCV’s DNN module. This is for when classical methods fail — for example, detecting objects of many different shapes and colors. I downloaded a pre-trained YOLO model (you can get the weights and config files from the darknet github). OpenCV can load it with cv2.dnn.readNet and run inference in a few lines.
def detect_objects(image, net, classes):
blob = cv2.dnn.blobFromImage(image, 1/255.0, (416,416), swapRB=True, crop=False)
net.setInput(blob)
outputs = net.forward(net.getUnconnectedOutLayersNames())
# Parse outputs for bounding boxes, apply NMS...
The cool part is you can combine deep learning detection with classical post-processing. For instance, after YOLO finds a face, you can apply histogram equalization on that region to improve lighting. The pipeline becomes a hybrid of learned and traditional methods.
No real-world pipeline would be complete without batch processing. I once needed to process 10,000 images. Doing it one by one in a for loop would take hours. I used Python’s concurrent.futures.ThreadPoolExecutor to process five files simultaneously. OpenCV releases the GIL during I/O, so threading works well for image loading and writing.
import concurrent.futures
from pathlib import Path
def worker(file, output_dir):
img = cv2.imread(str(file))
processed = my_pipeline(img)
out_path = output_dir / f"proc_{file.name}"
cv2.imwrite(str(out_path), processed)
return file.name
def batch_process(input_dir, output_dir, max_workers=4):
input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
files = list(input_path.glob("*.jpg")) + list(input_path.glob("*.png"))
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
results = executor.map(lambda f: worker(f, output_path), files)
for r in results:
print(r)
I’m still amazed at how far you can go with these eight techniques. They cover 90% of what you need for everyday image processing tasks. Start with a small pipeline — load an image, convert to grayscale, blur a little, and run edge detection. Then add one more step each time. Over the weeks, you’ll have a system that can read, clean, analyze, and understand images in ways you once thought required a team of PhDs. The key is to build iteratively, test each step visually, and never stop experimenting. I still keep a folder of test images with different challenges — low light, motion blur, funny colors — to stress-test every new pipeline I write. That’s the real secret to good image processing: not the coolest algorithm, but a pipeline you trust to work on real-world data.
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva
Top comments (0)