DEV Community

Ertugrul
Ertugrul

Posted on

🎨 Real-time Image Color Palette Extractor — A Deep Dive into K-means, LAB, and ΔE2000

Introduction

This project is a real-time image color palette extractor built with Python, OpenCV, and Streamlit.
The application analyzes an uploaded image, extracts dominant colors using K-means clustering, and matches them with the closest Tailwind CSS colors in the LAB color space using the ΔE2000 perceptual difference formula.

🔗 Live Demo: Real-time Palette Extractor
💻 Source Code: GitHub Repository
👤 Author: Ertuğrul Mutlu


1. System Architecture

📂 project_root
 ├── main.py           # Streamlit app entry point
 ├── core
 |   ├── extractor.py      # K-means clustering & palette extraction
 |   ├── tailwind.py       # Tailwind color data & matching functions
 |   ├── color_ops.py      # Color conversion & ΔE2000 calculations
 |   ├── ui.py             # UI rendering components for Streamlit
 └── requirements.txt  # Python dependencies
Enter fullscreen mode Exit fullscreen mode

Key benefits of this modular design:

  • Maintainability: Easy to swap algorithms or UI components.
  • Reusability: Core logic can be reused in CLI tools or APIs.
  • Performance Tuning: Optimization in one module won’t affect others.

2. Color Extraction Pipeline

2.1 Image Preprocessing

We read the image with Pillow, ensure RGB format, and resize to optimize speed without sacrificing visual accuracy.

from PIL import Image
import cv2
import numpy as np

def preprocess_image(file, max_side=1024):
    img = Image.open(file).convert("RGB")
    arr = np.array(img)
    h, w, _ = arr.shape
    if max(h, w) > max_side:
        scale = max_side / max(h, w)
        arr = cv2.resize(arr, (int(w*scale), int(h*scale)), interpolation=cv2.INTER_AREA)
    return arr
Enter fullscreen mode Exit fullscreen mode

Highlights:

  • Works with large images up to several MB in real-time.
  • Avoids quality loss by using INTER_AREA interpolation.

2.2 Dominant Color Extraction with K-means

We limit pixel sampling to ~400k points to avoid memory bottlenecks while preserving accuracy.

def extract_palette(image_array, k=6):
    pixels = image_array.reshape(-1, 3).astype(np.float32)
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 40, 0.2)
    _, labels, centers = cv2.kmeans(
        pixels, k, None, criteria, attempts=3, flags=cv2.KMEANS_PP_CENTERS
    )
    centers = np.clip(centers, 0, 255).astype(np.uint8)
    return centers
Enter fullscreen mode Exit fullscreen mode

Why K-means?

  • Stable clustering.
  • Efficient with OpenCV’s optimizations.
  • Easily adjustable cluster count.

3. Tailwind Color Matching

3.1 LAB Conversion

We convert both extracted colors and Tailwind palette entries from RGB to LAB for perceptually accurate distance measurement.

from skimage.color import rgb2lab
lab_palette = rgb2lab(tw_rgb / 255.0)
Enter fullscreen mode Exit fullscreen mode

3.2 ΔE2000 Matching

from skimage.color import deltaE_ciede2000

def nearest_tailwind(rgb):
    rgb_arr = np.array([rgb], dtype=np.uint8) / 255.0
    lab1 = rgb2lab(rgb_arr.reshape(1, 1, 3)).reshape(-1, 3)
    dists = deltaE_ciede2000(lab1, lab_palette)
    idx = np.argmin(dists)
    return tailwind_entries[idx], dists[idx]
Enter fullscreen mode Exit fullscreen mode

Advantages:

  • ΔE2000 is industry standard for print, branding, and design.
  • Human-eye aligned matching.

4. UI/UX Features

Built in Streamlit, the UI includes:

  • Sidebar with adjustable K value.
  • Toggle for Tailwind match overlay.
  • Color swatches with HEX/RGB.
  • Export to JSON, CSS vars, Tailwind config.
import streamlit as st

def display_color_block(hex_value, label):
    st.markdown(
        f"""
        <div style='width:100%; height:50px; border-radius:8px; background:{hex_value}; border:1px solid #ccc;'></div>
        <p>{label}</p>
        """,
        unsafe_allow_html=True
    )
Enter fullscreen mode Exit fullscreen mode

5. Applications

This tool is relevant for:

  • Design systems: Extracting brand colors from assets.
  • Web dev: Tailwind CSS color mapping.
  • Marketing: Ensuring campaign color consistency.
  • Art projects: Creating palettes from images.

6. Theory & Metrics (Deeper)

6.1 K-means, Initialization, and Complexity

  • Initialization: OpenCV uses k-means++ when flags=cv2.KMEANS_PP_CENTERS, which spreads initial centers and reduces poor local minima.
  • Iterations: Each iteration alternates assignment (nearest-center) and update (mean of assigned points).
  • Complexity: Roughly O(N × K × I), where N = sampled pixels, K = clusters, I = iterations. This is why we downscale and subsample.

Practical tip: sampling ~200k–400k pixels is usually indistinguishable from using all pixels for palette purposes but much faster.

6.2 RGB vs HSV vs LAB

Space What it represents Pros Cons
RGB Device-primaries Native for images, simple Not perceptually uniform
HSV Hue/Sat/Value Intuitive UI knobs Still not uniform; hue wrap tricky
LAB Lightness + opponent Approx. perceptual uniformity Conversion overhead

Why LAB? Distance in LAB correlates better with human perception, making nearest-color matching far more reliable.

6.3 ΔE Variants

  • ΔE76: Euclidean distance in LAB (simple, fast, but less accurate).
  • ΔE94: Adds weighting for chroma/lightness differences.
  • ΔE2000: Current standard; includes weighting + a rotation term R_T to handle blue region non-linearities.

Interpretation (rules of thumb):

  • ΔE < 1: nearly imperceptible
  • 1–2: perceptible through close observation
  • 2–10: perceptible at a glance
  • > 10: large difference

6.4 Our Vectorized ΔE2000

We use a fully vectorized ΔE2000 implementation so a single LAB color can be compared to the entire Tailwind palette efficiently:

# color_ops.py (excerpt)
import numpy as np

def deltaE2000(lab1: np.ndarray, lab2: np.ndarray) -> np.ndarray:
    L1, a1, b1 = lab1[:, 0:1], lab1[:, 1:2], lab1[:, 2:3]
    L2, a2, b2 = lab2[None, :, 0], lab2[None, :, 1], lab2[None, :, 2]
    C1 = np.sqrt(a1**2 + b1**2); C2 = np.sqrt(a2**2 + b2**2)
    C_bar = (C1 + C2) / 2.0
    C_bar7 = C_bar**7
    G = 0.5 * (1 - np.sqrt(C_bar7 / (C_bar7 + (25.0**7))))
    a1p = (1 + G) * a1; a2p = (1 + G) * a2
    C1p = np.sqrt(a1p**2 + b1**2); C2p = np.sqrt(a2p**2 + b2**2)
    def _atan2(y, x):
        ang = np.arctan2(y, x); return np.where(ang < 0, ang + 2*np.pi, ang)
    h1p = _atan2(b1, a1p); h2p = _atan2(b2, a2p)
    dLp = L1 - L2; dCp = C1p - C2p
    dhp = h2p - h1p
    dhp = np.where(dhp >  np.pi, dhp - 2*np.pi, dhp)
    dhp = np.where(dhp < -np.pi, dhp + 2*np.pi, dhp)
    dHp = 2.0 * np.sqrt(C1p * C2p) * np.sin(dhp / 2.0)
    Lp_bar = (L1 + L2) / 2.0; Cp_bar = (C1p + C2p) / 2.0
    hp_bar = (h1p + h2p) / 2.0
    hp_bar = np.where(np.abs(h1p - h2p) > np.pi, hp_bar + np.pi, hp_bar)
    hp_bar = np.where(hp_bar >= 2*np.pi, hp_bar - 2*np.pi, hp_bar)
    T = (1 - 0.17*np.cos(hp_bar - np.deg2rad(30))
           + 0.24*np.cos(2*hp_bar)
           + 0.32*np.cos(3*hp_bar + np.deg2rad(6))
           - 0.20*np.cos(4*hp_bar - np.deg2rad(63)))
    SL = 1 + (0.015 * (Lp_bar - 50)**2) / np.sqrt(20 + (Lp_bar - 50)**2)
    SC = 1 + 0.045 * Cp_bar
    SH = 1 + 0.015 * Cp_bar * T
    delta_theta = np.deg2rad(30) * np.exp(- ((np.rad2deg(hp_bar) - 275) / 25)**2)
    RC = 2 * np.sqrt(Cp_bar**7 / (Cp_bar**7 + 25.0**7))
    RT = -np.sin(2 * delta_theta) * RC
    dE = np.sqrt((dLp / SL)**2 + (dCp / SC)**2 + (dHp / SH)**2 + RT * (dCp / SC) * (dHp / SH))
    return dE.astype(np.float32)
Enter fullscreen mode Exit fullscreen mode

7. Performance & Caching

7.1 Sampling & Downscaling

  • Downscale to max_side=1024 (configurable) to cap pixels.
  • Subsample to ~400k pixels for k-means input.

7.2 Streamlit Caching

Use @st.cache_resource to store Tailwind entries + LAB matrix:

@st.cache_resource
def _load_tailwind_cache():
    entries, lab = build_tailwind_entries_and_lab_remote()
    return entries, lab
Enter fullscreen mode Exit fullscreen mode

This avoids re-downloading and re-converting on every rerun.

7.3 Vectorization Everywhere

  • Distance computations are pure NumPy (no Python loops) → orders of magnitude faster.
  • Weight computation assigns each (downscaled) pixel to a centroid in one vectorized pass.

8. Tailwind Palette — Remote Fetch + Fallback

We attempt CDN/UNPKG/GitHub raw in order; on failure we fallback to a local subset.

# tailwind_remote-like approach (concept)
CDN_CANDIDATES = [
  "https://cdn.jsdelivr.net/npm/tailwindcss@3.4.10/src/public/colors.js",
  "https://unpkg.com/tailwindcss@3.4.10/src/public/colors.js",
  "https://raw.githubusercontent.com/tailwindlabs/tailwindcss/master/src/public/colors.js",
]
Enter fullscreen mode Exit fullscreen mode

For portability (and Streamlit Cloud), ensure requests + json5 are in requirements.txt.


9. Accessibility Math (WCAG 2.x)

We compute relative luminance and contrast ratio to choose the ideal text color (black/white) per swatch.

# color_ops.py (excerpt)

def relative_luminance(rgb):
    def _srgb_to_lin(c):
        c = c/255.0
        return c/12.92 if c <= 0.04045 else ((c + 0.055)/1.055)**2.4
    r,g,b = rgb
    R,G,B = _srgb_to_lin(r), _srgb_to_lin(g), _srgb_to_lin(b)
    return 0.2126*R + 0.7152*G + 0.0722*B


def contrast_ratio(rgb1, rgb2):
    L1 = relative_luminance(rgb1); L2 = relative_luminance(rgb2)
    L1, L2 = (L1, L2) if L1 >= L2 else (L2, L1)
    return (L1 + 0.05) / (L2 + 0.05)
Enter fullscreen mode Exit fullscreen mode

Badge mapping used in the UI:

  • ≥ 7.0 → AAA
  • ≥ 4.5 → AA
  • ≥ 3.0 → AA (Large)
  • else → N/A

10. Sorting & Semantics

Beyond frequency (weight), we support hue and luminance sorting to make palettes more semantically meaningful.

import colorsys

def rgb_to_hsv_deg(rgb):
    r,g,b = [v/255.0 for v in rgb]
    h,s,v = colorsys.rgb_to_hsv(r,g,b)
    return h*360.0, s, v
Enter fullscreen mode Exit fullscreen mode

This allows quickly reordering UI tokens: primary → secondary → accent, etc.


11. Reproducibility & Determinism

  • Fix RNG seeds for sampling (seed=42).
  • K-means may still vary slightly; consider locking specific centroids in future iterations (advanced feature) for fully deterministic outputs.

12. Edge Cases & Pitfalls

  • Grayscale/monochrome images → many centers converge; expect similar hex values. Consider decreasing k automatically when variance is low.
  • Highly compressed JPEGs → block artifacts may create false small clusters; slight Gaussian blur can help.
  • Overexposed/underexposed images → weights biased toward very light/dark tones; consider histogram clipping.

13. Benchmarking & Profiling

  • Use timeit to compare ΔE76 vs ΔE2000 matching speed.
  • Profile with cProfile or line_profiler specifically around k-means and distance functions.
  • Cache Tailwind LAB and reuse across sessions.

14. Deployment Notes (Streamlit Cloud)

  • Ensure requirements.txt includes: streamlit, numpy, Pillow, opencv-python-headless, requests, json5.
  • No secrets needed. App is fully stateless.
  • Test on mobile; the grid is responsive (auto-fit min 220px).

15. Links

Top comments (0)