Alexey Vidanov

Posted on Oct 9 • Originally published at Medium

Building Perceptual Color Similarity Search with Amazon OpenSearch Service

#aws #opensearch #machinelearning #search

Introduction

Traditional keyword search fails for color matching. A customer searching for "burgundy" won't find "wine red" or "maroon," even though these colors are visually almost identical. The problem goes beyond vocabulary: human color perception is far richer than our limited naming system. While the human eye can distinguish millions of shades, we use only a few hundred common color names. Most colors exist in the unnamed spaces between "navy" and "royal blue," or "burgundy" and "crimson."

Simple RGB (Red, Green, Blue) distance calculations make this gap even wider. Two colors with nearly identical RGB values can appear very different, while visually similar ones may be far apart numerically. Because RGB describes how screens display color rather than how humans perceive it, it fails to recognize real-world similarities, especially when lighting or device conditions change.

To close this gap, we should switch from RGB to CIELAB, a color space designed to align with human vision. LAB describes color in terms of lightness and opponent color channels (green to red, blue to yellow), creating distances that reflect perceptual differences. This makes it ideal for comparing colors under varying lighting, shadows, or image quality.

We applied this approach in counterfeit detection. By indexing garments' colors in LAB and monitoring marketplace images, we detected suspicious listings where the perceptual distance ΔE exceeded a tuned threshold (ΔE > 15). Combined with metadata and text analysis, this reduced false positives and cut manual review workload in our proof of concept.

This article demonstrates how to build a production-ready perceptual color similarity search using Amazon OpenSearch Service with k-nearest neighbor (k-NN) capabilities and the CIELAB color space, a combination that enables systems to see color the way humans do.

Why RGB Distance Fails

RGB (Red, Green, Blue) is built for displaying color on screens, not for measuring how similar two colors look. Distances in RGB space often disagree with human perception.

Consider two pairs with the same RGB distance:

Example 1: Same distance, very different perception

Dark blue RGB(30, 30, 60) vs olive RGB(60, 60, 30)
Euclidean distance: 52
Human perception: colors are completely different (ΔE ≈ 25)

Example 2: Same distance, nearly identical perception

Dark red RGB(200, 100, 100) vs light red RGB(230, 130, 130)
Euclidean distance: 52
Human perception: colors are similar (ΔE ≈ 7)

The problem
Identical numerical distances can produce opposite visual outcomes. RGB distance does not predict how people see color differences because brightness and hue interactions matter far more than simple channel-wise arithmetic.

Why this happens
RGB treats red, green, and blue as independent, equally weighted axes. Human vision does not. Our eyes respond nonlinearly to brightness (greater sensitivity in darker ranges) and encode color through opponent channels (red vs green, blue vs yellow). As a result, equal RGB distances rarely correspond to equal perceptual differences.

The Solution: CIELAB Color Space

To align computer vision with human perception, we need a different color space. CIELAB (commonly written as LAB) is an international standard color space designed by the Commission Internationale de l'Éclairage to be perceptually uniform. In LAB, the same numerical distance corresponds to roughly the same perceived color difference, regardless of whether you're comparing dark blues, bright yellows, or muted grays. This perceptual uniformity makes LAB ideal for similarity search.

LAB Structure

LAB separates color into three components that mirror how human vision processes color:

L* (Lightness): 0 (black) to 100 (white), roughly aligned to perceived brightness
a*: green–red opponent channel; negative = green, positive = red (≈ −128 to +128)
b*: blue–yellow opponent channel; negative = blue, positive = yellow (≈ −128 to +128)

ΔE (Delta E): Measuring Perceptual Distance

In LAB space, the Euclidean distance between two colors is ΔE (Delta E):

ΔE76 = √[(L₂ - L₁)² + (a₂ - a₁)² + (b₂ - b₁)²]

Indicative interpretation based on empirical studies:

ΔE ≤ 1: Not perceptible under normal viewing
ΔE 1–2: Perceptible with close observation
ΔE 2–10: Noticeable; "similar but slightly different"
ΔE > 10: Clearly different

For most applications, ΔE76 (simple Euclidean distance) is sufficient. For precision-critical cases (e.g., cosmetics, paint), use ΔE2000, which compensates for known non-uniformities (notably in blue regions).

Architecture Overview

The pipeline extracts representative colors, converts them to LAB, and indexes vectors for fast similarity search:

Once colors are in LAB space, finding similar colors becomes a standard k-NN problem that OpenSearch's vector search capabilities handle efficiently.

Implementation

Step 1: RGB to LAB Conversion

First, extract a representative color (e.g., with OpenCV k-means clustering over product pixels, Amazon Rekognition features, or a masked region average for the product area). Then convert RGB to LAB using colormath:

from colormath.color_objects import sRGBColor, LabColor
from colormath.color_conversions import convert_color

def rgb_to_lab(r, g, b):
    """
    Convert RGB (0-255) to normalized LAB vector.
    Normalization keeps dimensions on comparable scales for k-NN.
    Without this, L* (0-100 range) would dominate distances.
    """
    rgb = sRGBColor(r, g, b, is_upscaled=True)
    lab = convert_color(rgb, LabColor)
    return [
        lab.lab_l / 100.0,   # L* [0,100] -> [0,1]
        lab.lab_a / 128.0,   # a* [-128,127] -> ~[-1,1]
        lab.lab_b / 128.0    # b* [-128,127] -> ~[-1,1]
    ]

# Example: Convert a burgundy coat color
lab_vector = rgb_to_lab(184, 33, 45)
print(lab_vector)  # [0.4036, 0.4555, 0.2576]

Multi-color products: For items with several prominent colors, either (a) index the dominant color (simple, smaller index), or (b) index the top N colors as separate docs sharing the same product_id (better recall; merge duplicates at read time).

Step 2: Create OpenSearch Index

Security note (production): Use VPC placement, IAM roles or fine-grained access control, and sign REST calls with AWS Signature Version 4.

PUT /product-colors
{
  "settings": {
    "index.knn": true,
    "index.number_of_shards": 2,
    "index.number_of_replicas": 1
  },
  "mappings": {
    "properties": {
      "product_id": { "type": "keyword" },
      "title": { "type": "text" },
      "color_name": { "type": "keyword" },
      "lab_vector": {
        "type": "knn_vector",
        "dimension": 3,
        "method": {
          "name": "hnsw",
          "engine": "lucene",
          "space_type": "l2",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      },
      "brand": { "type": "keyword" },
      "price": { "type": "float" }
    }
  }
}

lab_vector uses space_type: "l2" (Euclidean distance), aligning with ΔE76. HNSW provides fast approximate nearest neighbors; tune m/ef_construction for your scale and accuracy needs.

Step 3: Index Products

from opensearchpy import OpenSearch, helpers

client = OpenSearch(endpoint, http_auth=(user, password))

actions = []
for product in products:
    lab_vec = rgb_to_lab(*product['rgb'])
    actions.append({
        "_index": "product-colors",
        "_id": product['id'],
        "_source": {
            "product_id": product['id'],
            "title": product['title'],
            "color_name": product.get('color_name', 'Unknown'),
            "lab_vector": lab_vec,
            "brand": product['brand'],
            "price": product['price']
        }
    })

success, errors = helpers.bulk(client, actions)
print(f"Indexed {success} documents")
if errors:
    print(f"Errors: {errors}")

Step 4: Query Similar Colors

Basic similarity:

POST /product-colors/_search
{
  "size": 20,
  "query": {
    "knn": {
      "lab_vector": {
        "vector": [0.54, 0.64, 0.52],
        "k": 50
      }
    }
  }
}

(Fetch k=50 candidates to improve recall, then return size=20 to keep payloads small.)

Combine color similarity with business filters to ensure relevance:

POST /product-colors/_search
{
  "size": 20,
  "query": {
    "bool": {
      "must": [
        {
          "knn": {
            "lab_vector": {
              "vector": [0.54, 0.64, 0.52],
              "k": 50
            }
          }
        }
      ],
      "filter": [
        { "term": { "brand": "Premium Outerwear Co." } },
        { "range": { "price": { "lte": 500 } } }
      ]
    }
  }
}

Step 5: Optional ΔE2000 Re-Ranking

Use ΔE2000 when tiny shade differences matter (cosmetics, paint, textiles). For general e-commerce, ΔE76 is typically sufficient and faster.

import colorspacious

def rerank_with_delta_e2000(query_lab_vec, candidates, top_n=10):
    """Re-rank candidates using ΔE2000 for maximum perceptual accuracy."""
    query_lab = [
        query_lab_vec[0] * 100.0,  # L* [0,100]
        query_lab_vec[1] * 128.0,  # a* [-128,127]
        query_lab_vec[2] * 128.0   # b* [-128,127]
    ]

    scored = []
    for cand in candidates:
        lab_vec = cand['lab_vector']
        cand_lab = [
            lab_vec[0] * 100.0,
            lab_vec[1] * 128.0,
            lab_vec[2] * 128.0
        ]

        delta_e = colorspacious.deltaE(query_lab, cand_lab, input_space="CIELab")
        scored.append((delta_e, cand))

    scored.sort(key=lambda x: x[0])
    return [cand for _, cand in scored[:top_n]]

Real-World Use Cases

Fashion E-Commerce: Alternative Product Recommendations

Index each product's dominant color in LAB and use a moderate ΔE threshold (up to ~8) to include related shades (wine, maroon, oxblood). Combine with size/brand/category filters to keep results relevant.

Cosmetics: Precise Shade Matching

Use tight ΔE thresholds (< 2) plus ΔE2000 re-ranking. Optionally filter by undertone (warm/cool/neutral). This reduces returns and builds trust.

Brand Protection: Counterfeit Detection

Detect subtle color deviations in logos/branding. Index genuine logo LAB vectors and monitor marketplace listings for significant deviations; flag when ΔE > 15 for review. This approach reduced manual review workload by ~40% in a PoC and complements image/text analysis pipelines.

import numpy as np
from skimage.color import deltaE_ciede2000

def rerank_with_delta_e2000(query_lab_vec, candidates, top_n=10):
    """Re-rank candidates using true ΔE2000 for maximum perceptual accuracy."""
    if not candidates:
        return []

    # Prepare query
    q = np.array([query_lab_vec[0]*100.0, query_lab_vec[1]*128.0, query_lab_vec[2]*128.0], dtype=np.float64)

    # Build candidate array (n,3)
    cand_arr = np.array([
        [c['lab_vector'][0]*100.0, c['lab_vector'][1]*128.0, c['lab_vector'][2]*128.0]
        for c in candidates
    ], dtype=np.float64)

    # Compute ΔE2000 for all candidates at once
    q_rep = np.repeat(q[np.newaxis, :], cand_arr.shape[0], axis=0)
    delta_es = deltaE_ciede2000(q_rep, cand_arr)

    # Sort and return top_n
    idx = np.argsort(delta_es)[:top_n]
    return [candidates[i] for i in idx]

Best Practices

Implementation

Standardize photography (D65 ~6500K) and camera settings.
Work in LAB; avoid raw RGB similarity.
Handle backgrounds (segmentation/cropping to product pixels).
Choose color strategy: dominant color vs. top-N colors per item.

Performance & Scale

Start with ΔE76; add ΔE2000 only if user tests require it.
Combine with business filters (category, brand, price, size).
Tune HNSW (m, ef_construction, and ef_search).

Security & Operations

Secure the domain (VPC, IAM/FGAC, TLS, SigV4).
Alarms for p95 latency and memory pressure.
Iterate using CTR, conversion, complaints, and latency telemetry.

Validation

User tests to calibrate ΔE thresholds per domain.
A/B pilots before full rollout; monitor CTR, conversion, bounce, returns.

Bottom Line

Building perceptual color similarity search is about aligning technology with how humans actually see. Using CIELAB vectors and k-NN search in Amazon OpenSearch Service bridges that gap, allowing systems to understand color differences the way people do. Whether in fashion, cosmetics, or brand protection, it enables intuitive, human-centric experiences that go far beyond simple RGB filters.

If you are exploring how to make your product search perceptually aware or want to prototype an OpenSearch-based similarity engine, feel free to reach out.

At Reply, we help organizations design intelligent, scalable, and vision-aligned search solutions from proof of concept to production.

DEV Community

Building Perceptual Color Similarity Search with Amazon OpenSearch Service

Introduction

Why RGB Distance Fails

The Solution: CIELAB Color Space

LAB Structure

ΔE (Delta E): Measuring Perceptual Distance

Architecture Overview

Implementation

Step 1: RGB to LAB Conversion

Step 2: Create OpenSearch Index

Step 3: Index Products

Step 4: Query Similar Colors

Step 5: Optional ΔE2000 Re-Ranking

Real-World Use Cases

Fashion E-Commerce: Alternative Product Recommendations

Cosmetics: Precise Shade Matching

Brand Protection: Counterfeit Detection

Best Practices

Implementation

Performance & Scale

Security & Operations

Validation

Bottom Line

Top comments (0)