Ernests

Posted on Aug 10 • Originally published at ernestsrudzitis.com on Dec 30, 2024

My Computer Sees Flowers Better Than I Do (94% Accuracy, 0% Hay Fever)

#computervision #machinelearning #python #opencv

How I Accidentally Taught My Computer to Be a Botanist

Shape or color - what's more important for recognizing a flower?

I was doom-scrolling YouTube comments (as one does) when someone mentioned this thing called HOG - Histograms of Oriented Gradients. Three rabbit holes and a Wikipedia binge later, I'm building a flower classifier that hits 94% accuracy.

Here's the kicker: I combined a 20-year-old edge detection technique with basic color counting, and it works better than some neural networks I've tried. Welcome to the world where old-school computer vision still kicks ass. 🌻

The Mission: Teaching Silicon to Stop and Smell the Roses

We're building a classifier for the 17 flowers dataset - basically the most British collection of flowers you can imagine. Daffodils, bluebells, the works.

Why this matters: While everyone's throwing neural networks at everything (guilty), sometimes the "ancient" techniques from 2005 work brilliantly. Plus, you'll actually understand what's happening instead of staring at a black box.

Spoiler alert: By the end, your computer will identify flowers better than most humans. Mine certainly beats me - I call everything either "rose" or "not rose."

HOG: When Edge Detection Goes Super Saiyan

Histograms of Oriented Gradients sounds intimidating. It's not. Think of it this way: Instead of just finding edges (boring), we find edges AND remember which direction they're pointing (genius). It's like your edge detector finally learned to use a compass.

Here's what we're actually doing:

Find all the edges
Figure out which way they're pointing
Count them up in little neighborhoods
Profit

Let me show you on an actual daffodil:
Our victim - I mean, subject. A perfectly innocent daffodil about to be mathematically dissected.

Step 1: Preprocessing (Making Everything the Same Size)

The original HOG paper used 64×128 pixels for detecting humans. Why? ¯\(ツ)/¯

They literally said it "seemed to work well." Peak scientific rigor right there. For flowers, I went with 256×256 because:

Flowers aren't shaped like standing humans (shocking, I know)
It divides nicely for our calculations
Bigger = more detail for those intricate petals One size fits all - flower edition

Step 2: Gradient Computation (Finding the Edges)

Time to bust out the Sobel kernels. Don't panic - they're just tiny matrices that are really good at finding edges:

[-1, 0, 1] for horizontal edges
Same thing vertically for vertical edges

We slide these bad boys across the image, hunting for edges like a detective with a magnifying glass.

Fun fact that broke my brain: The horizontal kernel detects VERTICAL edges and vice versa. Because math hates intuition.

Left: Vertical edges. Right: Horizontal edges. Your flower is now a fancy edge map.

Step 3: Magnitude & Orientation (The Pythagorean Revenge)

Remember the Pythagorean theorem from high school? Time for its comeback tour! We combine horizontal and vertical gradients to get:

Magnitude: How strong is this edge? (Using good ol' a² + b² = c²)
Orientation: Which way is it pointing? (arctan has entered the chat)

Every pixel now knows which way it's pointing. They're all little compasses now. The original paper used angles from 0° to 180° (unsigned). I tried both signed and unsigned - made zero difference for flowers. Apparently, petals don't care about math conventions.

Step 4: Histogram Magic

Now for the clever bit. We chop the image into cells (I used 16×16 pixels) and create a histogram for each cell.

Translation: We count which directions the edges are pointing in each little neighborhood. Dividing our flower into a grid. Each square gets its own edge-direction census. Each cell becomes 9 numbers (one for each direction bin). It's like each cell is saying: "I've got 3 edges pointing up, 5 pointing right, 1 confused edge..."

Building histograms: We're literally counting edge directions. Democracy for gradients!

The result? Your flower is now described by dominant edge patterns:

Those lines show the dominant gradients in each cell. Your daffodil is now abstract art.

Step 5: Block Normalization (The Part Everyone Skips)

Here's where it gets a bit tedious but crucial. We group cells into blocks (2×2 cells) and normalize them. Why? Because lighting can be awful and shadows are jerks.
Normalization makes our features immune to "Is this a dark rose or a bright rose?" problems.

Sliding blocks across our cells. Yes, they overlap. Yes, that's intentional. No, I don't know why it works better. Final feature count: 15×15×4×9 = 8,100 numbers describing our flower. That's a lot of numbers for one daffodil.

Bonus Round: Color Histograms (Because Flowers Are Colorful, Duh)

HOG is colorblind by default. But we're classifying FLOWERS. Color matters! So I added color histograms - basically counting how many pixels are each shade of red, green, and blue. Dead simple, surprisingly effective.
Top: Original flower. Bottom: "How much of each color?" Roses are red, violets are blue, histograms don't rhyme.

The Code: Build Your Own Digital Botanist

The DIY Version (For Understanding)

import numpy as np
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from scipy.signal import convolve2d

class HOGExtractor:
    def  __init__ (self, image_size=(256, 256), cell_size=(16, 16), block_size=(2, 2), band_count=9):
        self.IMAGE_RESIZE_SIZE = image_size
        self.CELL_SIZE = cell_size
        self.BLOCK_SIZE = block_size
        self.BAND_COUNT = band_count
        self.BIN_WIDTH = 180 / band_count

        # Sobel operators for gradient computation
        self.horizontal_kernel = np.array([[-1, 0, 1]])
        self.vertical_kernel = np.array(self.horizontal_kernel.T)

        # Initialize computed attributes
        self.input_image = None
        self.resized_image = None
        self.gradient_magnitude = None
        self.gradient_orientation = None
        self.cell_histograms = None
        self.hog_descriptor = None

    def _load_image(self, pil_image):
        # The authors of HOG found an increase in accuracy
        # by taking into consideration all RGB channels, however we convert the image to grayscale for convenience of this example
        self.input_image = pil_image.convert('L')
        self.resized_image = np.array(self.input_image.resize(self.IMAGE_RESIZE_SIZE))
        self.resized_image = self.resized_image.astype(float)
        # Normalize the image pixel values to [0, 1]
        self.resized_image = (self.resized_image - self.resized_image.min()) / (self.resized_image.max() - self.resized_image.min())

    def _compute_gradients(self):
        # Apply sobel kernels using convolution
        # 'same' property ensures that the output has the same dimensions as the input image by automatically adding appropriate padding
        horizontal_gradient = convolve2d(self.resized_image, self.horizontal_kernel, mode='same')
        vertical_gradient = convolve2d(self.resized_image, self.vertical_kernel, mode='same')

        # Calculate gradient magnitude and orientation
        self.gradient_magnitude = np.sqrt(horizontal_gradient **2 + vertical_gradient** 2)
        self.gradient_orientation = np.arctan2(vertical_gradient, horizontal_gradient) * (180 / np.pi) % 180

    def compute_cell_histograms(self):
        # Calculate number of cells in each dimension
        cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
        cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]

        # Initialize histogram array for all cells
        self.cell_histograms = np.zeros((cells_y, cells_x, self.BAND_COUNT))

        # Compute histograms for each cell
        for y in range(cells_y):
            for x in range(cells_x):
                # Get current cell coordinates
                y_start = y * self.CELL_SIZE[1]
                y_end = (y + 1) * self.CELL_SIZE[1]
                x_start = x * self.CELL_SIZE[0]
                x_end = (x + 1) * self.CELL_SIZE[0]

                # Get magnitudes and orientations for current cell
                cell_magnitudes = self.gradient_magnitude[y_start:y_end, x_start:x_end]
                cell_orientations = self.gradient_orientation[y_start:y_end, x_start:x_end]

                # Create histogram for current cell
                histogram = np.zeros(self.BAND_COUNT)

                # Go over each pixel in the cell
                for i in range(self.CELL_SIZE[1]):
                    for j in range(self.CELL_SIZE[0]):
                        orientation = cell_orientations[i, j]
                        magnitude = cell_magnitudes[i, j]

                        # Compute bin index for current orientation, and add magnitude to corresponding bin
                        bin_index = int(orientation // self.BIN_WIDTH)
                        histogram[bin_index] += magnitude

                self.cell_histograms[y, x] = histogram

    def compute_hog_descriptor(self):
        # Calculate number of blocks
        cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
        cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]
        blocks_y = cells_y - self.BLOCK_SIZE[0] + 1
        blocks_x = cells_x - self.BLOCK_SIZE[1] + 1

        # Initialize final HOG descriptor
        hog_descriptor = []

        # Slide the block window across cells
        for y in range(blocks_y):
            for x in range(blocks_x):
                # Get histograms for current block (2x2 cells)
                block_histograms = []
                for cell_y in range(self.BLOCK_SIZE[0]):
                    for cell_x in range(self.BLOCK_SIZE[1]):
                        cell_histogram = self.cell_histograms[y + cell_y, x + cell_x]
                        block_histograms.extend(cell_histogram)

                # Normalize block using L2 norm
                # Small epsilon value prevents division by zero
                block_histograms = np.array(block_histograms)
                l2_norm = np.sqrt(np.sum(block_histograms ** 2) + 1e-6)
                normalized_block = block_histograms / l2_norm

                # Add normalized block histograms to final descriptor
                hog_descriptor.extend(normalized_block)

        self.hog_descriptor = np.array(hog_descriptor)
        return self.hog_descriptor

    def extract_features(self, pil_image):
        self._load_image(pil_image)
        self._compute_gradients()
        self.compute_cell_histograms()
        return self.compute_hog_descriptor()

    def visualize(self):
        self._visualize_hog()

    def _visualize_hog(self):
        # Calculate dimensions
        cells_y = self.IMAGE_RESIZE_SIZE[1] // self.CELL_SIZE[1]
        cells_x = self.IMAGE_RESIZE_SIZE[0] // self.CELL_SIZE[0]

        # Create visualization
        vis_image = Image.new('RGB', self.IMAGE_RESIZE_SIZE, 'black')
        draw = ImageDraw.Draw(vis_image)

        cell_height, cell_width = self.CELL_SIZE
        line_length = min(cell_height, cell_width) // 2

        # Draw lines using raw cell histograms directly
        for y in range(cells_y):
            for x in range(cells_x):
                # Use raw cell histograms instead of normalized ones
                raw_histogram = self.cell_histograms[y, x]
                self._draw_cell_visualization(draw, x, y, cell_width, cell_height, 
                                        line_length, raw_histogram)

        self._show_visualization(vis_image, 'Raw HOG Visualization')

    # Private helper  function to draw cell visualization
    def _draw_cell_visualization(self, draw, x, y, cell_width, cell_height, line_length, histogram):
        cell_center_y = (y + 0.5) * cell_height
        cell_center_x = (x + 0.5) * cell_width

        for orientation_bin in range(self.BAND_COUNT):
            orientation = orientation_bin * (180 / self.BAND_COUNT)
            magnitude = histogram[orientation_bin]


            radian = np.deg2rad(orientation)
            dx = line_length * np.cos(radian) * magnitude / np.max(histogram)
            dy = line_length * np.sin(radian) * magnitude / np.max(histogram)

            draw.line([
                (cell_center_x - dx, cell_center_y - dy),
                (cell_center_x + dx, cell_center_y + dy)
            ], fill='white', width=1)

    # Private helper function to show visualization
    def _show_visualization(self, vis_image, title):
        plt.figure(figsize=(10, 15))
        plt.subplot(311)
        plt.title('Original Image')
        plt.imshow(self.input_image, cmap='gray')

        plt.subplot(312)
        plt.title(title)
        plt.imshow(vis_image)

        plt.tight_layout()
        plt.show()

The Pro Version (For Actually Using)

from skimage.feature import hog
from skimage.transform import resize
import numpy as np
import matplotlib.pyplot as plt

class HOGExtractor:
   def  __init__ (self, image_size=(256, 256), cell_size=(16, 16), block_size=(2, 2), band_count=9):
        self.image_size = image_size
        self.input_image = None
        self.cell_size = cell_size
        self.block_size = block_size
        self.band_count = band_count

    def extract_features(self, image):
        self.input_image = image
        img_array = np.array(image)
        img_array = resize(img_array, self.image_size)

        features, hog_image = hog(
            img_array,
            orientations=self.band_count,
            pixels_per_cell=self.cell_size,
            cells_per_block=self.block_size,
            visualize=True,
            channel_axis=-1
        )
        self.hog_image = hog_image
        return features

    def visualize(self):
        if self.input_image is None or self.hog_image is None:
            return

        plt.figure(figsize=(10, 5))

        plt.subplot(121)
        plt.title('Original Image')
        plt.imshow(self.input_image)
        plt.axis('off')

        plt.subplot(122)
        plt.title('HOG Visualization')
        plt.imshow(self.hog_image, cmap='gray')
        plt.axis('off')

        plt.tight_layout()
        plt.show()

import numpy as np
import matplotlib.pyplot as plt

class ColorHistogramExtractor:
    def  __init__ (self, bins=256, channels=3):
        self.bins = bins
        self.channels = channels
        self.image = None
        self.histograms = None
        self.colors = ['red', 'green', 'blue']
        self.channel_names = ['Red', 'Green', 'Blue']

    def load_image(self, pil_image):
        self.image = np.array(pil_image)
        return self

    def extract_features(self, image_array=None, normalize=True):
        if image_array is not None:
            self.load_image(image_array)

        self.histograms = []
        for channel in range(self.channels):
            histogram, _ = np.histogram(
                # Selects all pixels for a specific color channel (R, G, or B) using numpy's ellipsis notation
                # and flattens the 2D array of pixel values into a 1D array
                self.image[..., channel].ravel(),
                bins=self.bins, # divides the range into equal-width bins
                range=(0, 256)
            )

            if normalize:
                histogram = histogram / histogram.sum()

            self.histograms.append(histogram)

        return np.concatenate(self.histograms)

    def visualize(self):
        plt.figure(figsize=(10, 6))

        # Plot original image
        plt.subplot(2, 1, 1)
        plt.title('Original Image')
        plt.imshow(self.image)
        plt.axis('off')

        # Plot histograms as bars
        plt.subplot(2, 1, 2)
        plt.title('Color Histograms')

        x = np.linspace(0, 1, self.bins)  # Normalized x-axis [0,1]
        bar_width = 1.0 / self.bins

        for channel in range(self.channels):
            plt.bar(x, self.histograms[channel], 
                color=self.colors[channel], 
                label=self.channel_names[channel],
                alpha=0.3,
                width=bar_width)

        plt.xlabel('Pixel Intensity')
        plt.ylabel('Frequency')
        plt.legend()
        plt.grid(True, alpha=0.3)
        plt.xlim(0, 1)  # Set x-axis limits to [0,1]

        plt.tight_layout()
        plt.show()

Pro tip: Use scikit-image. It's literally 10x faster. I tested. Unless you're a masochist or doing homework, use the library.

The Moment of Truth: Building the Classifier

Time to put it all together with a Random Forest (because they're impossible to kill and always work):

from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# Load your flower dataset (2040 training, 1040 test images)
# Each class has 80 images, math checks out

# Extract features (HOG + Color)
def extract_all_features(image):
    hog_features = hog_extractor.extract_features(image)
    color_features = color_extractor.extract_features(image)
    return np.concatenate([hog_features, color_features])

# Build the pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),  # Normalize everything
    ('classifier', RandomForestClassifier(
        n_estimators=300,      # 300 trees in this forest
        max_depth=50,          # Deep trees for complex flowers
        min_samples_split=5,   # Don't overfit too hard
        min_samples_leaf=2,    
        max_features='sqrt',   # Random subset of features
        class_weight='balanced_subsample'  # Handle imbalanced classes
    ))
])

# Train it
pipeline.fit(X_train, y_train)

# The moment of truth
train_accuracy = pipeline.score(X_train, y_train)
test_accuracy = pipeline.score(X_test, y_test)

print(f"Train: {train_accuracy:.3f}, Test: {test_accuracy:.3f}")
# Train: 1.000, Test: 0.946

The Results Are In: 94.6% Accuracy! 🎉

Train accuracy: 100% (Random Forests memorize everything, the overachievers) Test accuracy: 94.6% (This is what actually matters)

That 5% gap? That's overfitting, but honestly, for Random Forests with 300 deep trees, it's totally normal. They're supposed to memorize the training data, it's literally their job.

What We Just Built

Let's appreciate what happened here:

We took a 20-year-old technique (HOG)
Added basic color counting
Threw in a Random Forest
Beat most humans at flower identification

Your computer can now:

Detect petal shapes through edge patterns
Remember color distributions
Combine both to identify flowers with 94.6% accuracy

All without a single neural network. Take that, deep learning! (Just kidding, I love neural networks too)

Level Up Your Flower Game

Want to push past 94%? Try these:

Local Binary Patterns (LBP): Another texture descriptor that's stupid simple but effective
Different classifiers: SVM might squeeze out another percent or two
Data augmentation: Rotate those flowers, flip them, make them work for it
Deep features: Fine, use a neural network for features but keep the Random Forest (hybrid approach)

You're a Computer Vision Expert Now (Or Almost)

Seriously, you just understood:

Edge detection with personality (HOG)
Why normalization matters (silly shadows)
How to combine shape and color features
Building a classifier that actually works

This isn't some toy example, this is legitimate computer vision that was state-of-the-art not that long ago and STILL works brilliantly for many real-world problems.

The best part? You can explain every single step. No black boxes, no "the network learned something," just pure, understandable math doing its thing.

Want the deep dive? Check out the full technical analysis on my blog

DEV Community