How I Accidentally Taught My Computer to Be a Botanist
Shape or color - what's more important for recognizing a flower?
I was doom-scrolling YouTube comments (as one does) when someone mentioned this thing called HOG - Histograms of Oriented Gradients. Three rabbit holes and a Wikipedia binge later, I'm building a flower classifier that hits 94% accuracy.
Here's the kicker: I combined a 20-year-old edge detection technique with basic color counting, and it works better than some neural networks I've tried. Welcome to the world where old-school computer vision still kicks ass. 🌻
The Mission: Teaching Silicon to Stop and Smell the Roses
We're building a classifier for the 17 flowers dataset - basically the most British collection of flowers you can imagine. Daffodils, bluebells, the works.
Why this matters: While everyone's throwing neural networks at everything (guilty), sometimes the "ancient" techniques from 2005 work brilliantly. Plus, you'll actually understand what's happening instead of staring at a black box.
Spoiler alert: By the end, your computer will identify flowers better than most humans. Mine certainly beats me - I call everything either "rose" or "not rose."
HOG: When Edge Detection Goes Super Saiyan
Histograms of Oriented Gradients sounds intimidating. It's not. Think of it this way: Instead of just finding edges (boring), we find edges AND remember which direction they're pointing (genius). It's like your edge detector finally learned to use a compass.
Here's what we're actually doing:
- Find all the edges
- Figure out which way they're pointing
- Count them up in little neighborhoods
- Profit
Let me show you on an actual daffodil:
Our victim - I mean, subject. A perfectly innocent daffodil about to be mathematically dissected.
Step 1: Preprocessing (Making Everything the Same Size)
The original HOG paper used 64×128 pixels for detecting humans. Why? ¯\(ツ)/¯
They literally said it "seemed to work well." Peak scientific rigor right there. For flowers, I went with 256×256 because:
- Flowers aren't shaped like standing humans (shocking, I know)
- It divides nicely for our calculations
- Bigger = more detail for those intricate petals
One size fits all - flower edition
Step 2: Gradient Computation (Finding the Edges)
Time to bust out the Sobel kernels. Don't panic - they're just tiny matrices that are really good at finding edges:
-
[-1, 0, 1]for horizontal edges - Same thing vertically for vertical edges
We slide these bad boys across the image, hunting for edges like a detective with a magnifying glass.
Fun fact that broke my brain: The horizontal kernel detects VERTICAL edges and vice versa. Because math hates intuition.
Left: Vertical edges. Right: Horizontal edges. Your flower is now a fancy edge map.
Step 3: Magnitude & Orientation (The Pythagorean Revenge)
Remember the Pythagorean theorem from high school? Time for its comeback tour! We combine horizontal and vertical gradients to get:
- Magnitude: How strong is this edge? (Using good ol' a² + b² = c²)
- Orientation: Which way is it pointing? (arctan has entered the chat)
Every pixel now knows which way it's pointing. They're all little compasses now. The original paper used angles from 0° to 180° (unsigned). I tried both signed and unsigned - made zero difference for flowers. Apparently, petals don't care about math conventions.
Step 4: Histogram Magic
Now for the clever bit. We chop the image into cells (I used 16×16 pixels) and create a histogram for each cell.
Translation: We count which directions the edges are pointing in each little neighborhood.
Dividing our flower into a grid. Each square gets its own edge-direction census. Each cell becomes 9 numbers (one for each direction bin). It's like each cell is saying: "I've got 3 edges pointing up, 5 pointing right, 1 confused edge..."
Building histograms: We're literally counting edge directions. Democracy for gradients!
The result? Your flower is now described by dominant edge patterns:
Those lines show the dominant gradients in each cell. Your daffodil is now abstract art.
Step 5: Block Normalization (The Part Everyone Skips)
Here's where it gets a bit tedious but crucial. We group cells into blocks (2×2 cells) and normalize them. Why? Because lighting can be awful and shadows are jerks.
Normalization makes our features immune to "Is this a dark rose or a bright rose?" problems.
Sliding blocks across our cells. Yes, they overlap. Yes, that's intentional. No, I don't know why it works better. Final feature count: 15×15×4×9 = 8,100 numbers describing our flower. That's a lot of numbers for one daffodil.
Bonus Round: Color Histograms (Because Flowers Are Colorful, Duh)
HOG is colorblind by default. But we're classifying FLOWERS. Color matters! So I added color histograms - basically counting how many pixels are each shade of red, green, and blue. Dead simple, surprisingly effective.
Top: Original flower. Bottom: "How much of each color?" Roses are red, violets are blue, histograms don't rhyme.
The Code: Build Your Own Digital Botanist
The DIY Version (For Understanding)
import numpy as np
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from scipy.signal import convolve2d
class HOGExtractor:
def __init__ (self, image_size=(256, 256), cell_size=(16, 16), block_size=(2, 2), band_count=9):
self.IMAGE_RESIZE_SIZE = image_size
self.CELL_SIZE = cell_size
self.BLOCK_SIZE = block_size
self.BAND_COUNT = band_count
self.BIN_WIDTH = 180 / band_count
# Sobel operators for gradient computation
self.horizontal_kernel = np.array([[-1, 0, 1]])
self.vertical_kernel = np.array(self.horizontal_kernel.T)
# Initialize computed attributes
self.input_image = None
self.resized_image = None
self.gradient_magnitude = None
self.gradient_orientation = None
self.cell_histograms = None
self.hog_descriptor = None
def _load_image(self, pil_image):
# The authors of HOG found an increase in accuracy
# by taking into consideration all RGB channels, however we convert the image to grayscale for convenience of this example
self.input_image = pil_image.convert('L')
self.resized_image = np.array(self.input_image.resize(self.IMAGE_RESIZE_SIZE))
self.resized_image = self.resized_image.astype(float)
# Normalize the image pixel values to [0, 1]
self.resized_image = (self.resized_image - self.resized_image.min()) / (self.resized_image.max() - self.resized_image.min())
def _compute_gradients(self):
# Apply sobel kernels using convolution
# 'same' property ensures that the output has the same dimensions as the input image by automatically adding appropriate padding
horizontal_gradient = convolve2d(self.resized_image, self.horizontal_kernel, mode='same')
vertical_gradient = convolve2d(self.resized_image, self.vertical_kernel, mode='same')
# Calculate gradient magnitude and orientation
self.gradient_magnitude = np.sqrt(horizontal_gradient **2 + vertical_gradient** 2)
self.gradient_orientation = np.arctan2(vertical_gradient, horizontal_gradient) * (180 / np.pi) % 180
def compute_cell_histograms(self):
# Calculate number of cells in each dimension
cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]
# Initialize histogram array for all cells
self.cell_histograms = np.zeros((cells_y, cells_x, self.BAND_COUNT))
# Compute histograms for each cell
for y in range(cells_y):
for x in range(cells_x):
# Get current cell coordinates
y_start = y * self.CELL_SIZE[1]
y_end = (y + 1) * self.CELL_SIZE[1]
x_start = x * self.CELL_SIZE[0]
x_end = (x + 1) * self.CELL_SIZE[0]
# Get magnitudes and orientations for current cell
cell_magnitudes = self.gradient_magnitude[y_start:y_end, x_start:x_end]
cell_orientations = self.gradient_orientation[y_start:y_end, x_start:x_end]
# Create histogram for current cell
histogram = np.zeros(self.BAND_COUNT)
# Go over each pixel in the cell
for i in range(self.CELL_SIZE[1]):
for j in range(self.CELL_SIZE[0]):
orientation = cell_orientations[i, j]
magnitude = cell_magnitudes[i, j]
# Compute bin index for current orientation, and add magnitude to corresponding bin
bin_index = int(orientation // self.BIN_WIDTH)
histogram[bin_index] += magnitude
self.cell_histograms[y, x] = histogram
def compute_hog_descriptor(self):
# Calculate number of blocks
cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]
blocks_y = cells_y - self.BLOCK_SIZE[0] + 1
blocks_x = cells_x - self.BLOCK_SIZE[1] + 1
# Initialize final HOG descriptor
hog_descriptor = []
# Slide the block window across cells
for y in range(blocks_y):
for x in range(blocks_x):
# Get histograms for current block (2x2 cells)
block_histograms = []
for cell_y in range(self.BLOCK_SIZE[0]):
for cell_x in range(self.BLOCK_SIZE[1]):
cell_histogram = self.cell_histograms[y + cell_y, x + cell_x]
block_histograms.extend(cell_histogram)
# Normalize block using L2 norm
# Small epsilon value prevents division by zero
block_histograms = np.array(block_histograms)
l2_norm = np.sqrt(np.sum(block_histograms ** 2) + 1e-6)
normalized_block = block_histograms / l2_norm
# Add normalized block histograms to final descriptor
hog_descriptor.extend(normalized_block)
self.hog_descriptor = np.array(hog_descriptor)
return self.hog_descriptor
def extract_features(self, pil_image):
self._load_image(pil_image)
self._compute_gradients()
self.compute_cell_histograms()
return self.compute_hog_descriptor()
def visualize(self):
self._visualize_hog()
def _visualize_hog(self):
# Calculate dimensions
cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]
# Create visualization
vis_image = Image.new('RGB', self.IMAGE_RESIZE_SIZE, 'black')
draw = ImageDraw.Draw(vis_image)
cell_height, cell_width = self.CELL_SIZE
line_length = min(cell_height, cell_width) // 2
# Draw lines using raw cell histograms directly
for y in range(cells_y):
for x in range(cells_x):
# Use raw cell histograms instead of normalized ones
raw_histogram = self.cell_histograms[y, x]
self._draw_cell_visualization(draw, x, y, cell_width, cell_height,
line_length, raw_histogram)
self._show_visualization(vis_image, 'Raw HOG Visualization')
# Private helper function to draw cell visualization
def _draw_cell_visualization(self, draw, x, y, cell_width, cell_height, line_length, histogram):
cell_center_y = (y + 0.5) * cell_height
cell_center_x = (x + 0.5) * cell_width
for orientation_bin in range(self.BAND_COUNT):
orientation = orientation_bin * (180 / self.BAND_COUNT)
magnitude = histogram[orientation_bin]
radian = np.deg2rad(orientation)
dx = line_length * np.cos(radian) * magnitude / np.max(histogram)
dy = line_length * np.sin(radian) * magnitude / np.max(histogram)
draw.line([
(cell_center_x - dx, cell_center_y - dy),
(cell_center_x + dx, cell_center_y + dy)
], fill='white', width=1)
# Private helper function to show visualization
def _show_visualization(self, vis_image, title):
plt.figure(figsize=(10, 15))
plt.subplot(311)
plt.title('Original Image')
plt.imshow(self.input_image, cmap='gray')
plt.subplot(312)
plt.title(title)
plt.imshow(vis_image)
plt.tight_layout()
plt.show()
The Pro Version (For Actually Using)
from skimage.feature import hog
from skimage.transform import resize
import numpy as np
import matplotlib.pyplot as plt
class HOGExtractor:
def __init__ (self, image_size=(256, 256), cell_size=(16, 16), block_size=(2, 2), band_count=9):
self.image_size = image_size
self.input_image = None
self.cell_size = cell_size
self.block_size = block_size
self.band_count = band_count
def extract_features(self, image):
self.input_image = image
img_array = np.array(image)
img_array = resize(img_array, self.image_size)
features, hog_image = hog(
img_array,
orientations=self.band_count,
pixels_per_cell=self.cell_size,
cells_per_block=self.block_size,
visualize=True,
channel_axis=-1
)
self.hog_image = hog_image
return features
def visualize(self):
if self.input_image is None or self.hog_image is None:
return
plt.figure(figsize=(10, 5))
plt.subplot(121)
plt.title('Original Image')
plt.imshow(self.input_image)
plt.axis('off')
plt.subplot(122)
plt.title('HOG Visualization')
plt.imshow(self.hog_image, cmap='gray')
plt.axis('off')
plt.tight_layout()
plt.show()
import numpy as np
import matplotlib.pyplot as plt
class ColorHistogramExtractor:
def __init__ (self, bins=256, channels=3):
self.bins = bins
self.channels = channels
self.image = None
self.histograms = None
self.colors = ['red', 'green', 'blue']
self.channel_names = ['Red', 'Green', 'Blue']
def load_image(self, pil_image):
self.image = np.array(pil_image)
return self
def extract_features(self, image_array=None, normalize=True):
if image_array is not None:
self.load_image(image_array)
self.histograms = []
for channel in range(self.channels):
histogram, _ = np.histogram(
# Selects all pixels for a specific color channel (R, G, or B) using numpy's ellipsis notation
# and flattens the 2D array of pixel values into a 1D array
self.image[..., channel].ravel(),
bins=self.bins, # divides the range into equal-width bins
range=(0, 256)
)
if normalize:
histogram = histogram / histogram.sum()
self.histograms.append(histogram)
return np.concatenate(self.histograms)
def visualize(self):
plt.figure(figsize=(10, 6))
# Plot original image
plt.subplot(2, 1, 1)
plt.title('Original Image')
plt.imshow(self.image)
plt.axis('off')
# Plot histograms as bars
plt.subplot(2, 1, 2)
plt.title('Color Histograms')
x = np.linspace(0, 1, self.bins) # Normalized x-axis [0,1]
bar_width = 1.0 / self.bins
for channel in range(self.channels):
plt.bar(x, self.histograms[channel],
color=self.colors[channel],
label=self.channel_names[channel],
alpha=0.3,
width=bar_width)
plt.xlabel('Pixel Intensity')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xlim(0, 1) # Set x-axis limits to [0,1]
plt.tight_layout()
plt.show()
Pro tip: Use scikit-image. It's literally 10x faster. I tested. Unless you're a masochist or doing homework, use the library.
The Moment of Truth: Building the Classifier
Time to put it all together with a Random Forest (because they're impossible to kill and always work):
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
# Load your flower dataset (2040 training, 1040 test images)
# Each class has 80 images, math checks out
# Extract features (HOG + Color)
def extract_all_features(image):
hog_features = hog_extractor.extract_features(image)
color_features = color_extractor.extract_features(image)
return np.concatenate([hog_features, color_features])
# Build the pipeline
pipeline = Pipeline([
('scaler', StandardScaler()), # Normalize everything
('classifier', RandomForestClassifier(
n_estimators=300, # 300 trees in this forest
max_depth=50, # Deep trees for complex flowers
min_samples_split=5, # Don't overfit too hard
min_samples_leaf=2,
max_features='sqrt', # Random subset of features
class_weight='balanced_subsample' # Handle imbalanced classes
))
])
# Train it
pipeline.fit(X_train, y_train)
# The moment of truth
train_accuracy = pipeline.score(X_train, y_train)
test_accuracy = pipeline.score(X_test, y_test)
print(f"Train: {train_accuracy:.3f}, Test: {test_accuracy:.3f}")
# Train: 1.000, Test: 0.946
The Results Are In: 94.6% Accuracy! 🎉
Train accuracy: 100% (Random Forests memorize everything, the overachievers) Test accuracy: 94.6% (This is what actually matters)
That 5% gap? That's overfitting, but honestly, for Random Forests with 300 deep trees, it's totally normal. They're supposed to memorize the training data, it's literally their job.
What We Just Built
Let's appreciate what happened here:
- We took a 20-year-old technique (HOG)
- Added basic color counting
- Threw in a Random Forest
- Beat most humans at flower identification
Your computer can now:
- Detect petal shapes through edge patterns
- Remember color distributions
- Combine both to identify flowers with 94.6% accuracy
All without a single neural network. Take that, deep learning! (Just kidding, I love neural networks too)
Level Up Your Flower Game
Want to push past 94%? Try these:
- Local Binary Patterns (LBP): Another texture descriptor that's stupid simple but effective
- Different classifiers: SVM might squeeze out another percent or two
- Data augmentation: Rotate those flowers, flip them, make them work for it
- Deep features: Fine, use a neural network for features but keep the Random Forest (hybrid approach)
You're a Computer Vision Expert Now (Or Almost)
Seriously, you just understood:
- Edge detection with personality (HOG)
- Why normalization matters (silly shadows)
- How to combine shape and color features
- Building a classifier that actually works
This isn't some toy example, this is legitimate computer vision that was state-of-the-art not that long ago and STILL works brilliantly for many real-world problems.
The best part? You can explain every single step. No black boxes, no "the network learned something," just pure, understandable math doing its thing.
Want the deep dive? Check out the full technical analysis on my blog
Top comments (0)