Itay Eylath

Posted on Feb 23 • Edited on Feb 24

From Metrics to Action: Turning Embedding Analysis into Sprint Tickets

#ai #machinelearning #startup #productivity

In an agile Computer Vision startup, global accuracy is a vanity metric.

It tells you the model is working.

It doesn’t tell you what to fix next.

To move fast, you need actionable signals not dashboards.

Here’s how we transform raw embeddings into sprint ready decisions.

1. Stop Calculating Accuracy. Start Finding Confusion.

A single accuracy score hides local failures.

Instead of asking “How accurate is the model?”, ask:

Where exactly is the model failing?

We use 1-Nearest Neighbor (1-NN) evaluation to expose structural confusion between visually similar items.

import numpy as np
from sklearn.metrics import pairwise_distances

# Compute pairwise distances between embeddings
D = pairwise_distances(embeddings, metric="euclidean")

# Ignore self comparisons
np.fill_diagonal(D, np.inf)

# Identify closest neighbor
closest_idx = np.argmin(D, axis=1)
is_correct = labels == labels[closest_idx]

Instead of celebrating 88% accuracy, we extract:

Which classes are confused
How often confusion happens
Whether errors are isolated or concentrated

This immediately narrows the problem space.

2. Margin: Measuring Model Uncertainty

Accuracy is binary.

Confidence is continuous.

We define Margin as the gap between the closest and second closest neighbors:

Margin = Dist_{2nd} - Dist_{1st}

If the margin is near zero, the model is effectively guessing between two nearly identical items.

# Sort distances per sample
sorted_dists = np.sort(D, axis=1)

# Compute margin
margin = sorted_dists[:, 1] - sorted_dists[:, 0]

# Flag unstable samples
low_confidence = margin < 0.05

Low margin samples are early warning signals.

They tell you where instability lives before accuracy visibly drops.

3. Visualization for Intuition (Not Decisions)

Metrics drive action.

Visualization builds intuition.

We use PCA or t-SNE to project embeddings into 2D space:

from sklearn.manifold import TSNE

tsne_results = TSNE(
    n_components=2,
    perplexity=30,
    random_state=42
).fit_transform(embeddings)

Overlapping clusters often explain:

Why two classes are confused
Whether boundaries are weak
Whether the representation itself lacks separation

Visualization doesn’t replace metrics it explains them.

4. The Operational Matrix: Converting Signals to Ownership

Analysis only matters if someone owns the fix.

We translate technical signals into clear action items:

Signal	Meaning	Action Item	Owner
High Intra-class Distance	Same class samples are visually inconsistent	Clean dataset (remove low-quality samples)	Data Ops
Strong Confusion Pair	Two distinct classes overlap heavily	Add discriminative features or review labeling	CV / Product
Low Margin Cluster	Category boundary is unstable	Targeted data collection or fine-tuning	ML Team

This is where evaluation becomes execution.

5. The Agile Loop

Every iteration follows the same structure:

Run embedding evaluation
Extract top confusion pairs
Rank unstable classes by margin
Convert top issues into sprint tickets
Re-run metrics after fixes

Metrics → Ownership → Action → Re-measure

No guesswork.

No intuition battles.

No blind retraining.

What We Learned

Most persistent errors were structural, not architectural.
Margin was a better early warning signal than accuracy.
Concentrated confusion mattered more than global percentage.
Ownership accelerated improvement more than model complexity.

The biggest shift wasn’t technical.

It was operational.

The Bottom Line

In real world Computer Vision systems, the difference between stable and unstable performance isn’t a new architecture.

It’s a tighter feedback loop.

If a metric doesn’t lead to a sprint ticket, it’s just noise.

The best models aren’t the ones with the highest accuracy.

They’re the ones whose failures are measurable, owned, and shrinking every iteration.

This project is open source feel free to explore the code on GitHub and try it yourself:
https://github.com/itayeylath/metrics-to-actions

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.