Unsupervised Graph Clustering: How Photo Apps Like Google Photos Track Facial Drift Over Time

#go #machinelearning #ai #computerscience

We generally assume that categorizing photos of a person across a decade is a supervised machine learning problem. You upload photos, the system scans the pixels, and it matches the faces.

This assumption fails when applied to long-term data. A person at age 5 and that same person at age 40 possess completely different facial geometries. A naive matching algorithm will classify them as two separate entities. To solve this without user labels, systems abandon pixel matching entirely. They rely on high-dimensional embeddings and unsupervised graph clustering.

The Geometry of Embedding Space

When the system processes a photo, it does not store the image of the face. It processes the image through a neural network to generate an embedding. This embedding is an array of floating-point numbers. These numbers represent exact geometric ratios: the distance between the pupils, the depth of the nose bridge, and the angle of the jaw.

This array plots the face as a single point in a high-dimensional vector space. Two photos of the same person taken on the same day will produce points that are mathematically close together.

Here is how the data structure and distance calculation translate into code:

package main

import "math"

// Embedding is a list of numbers representing the geometry of a face.
// We are using a simplified 1D version so the distance math stays readable.
type Embedding struct {
    Values []float64
}

// Distance calculates the Euclidean distance between two embeddings.
// Small distance = similar faces => likely same person.
// Large distance = different faces => likely different people.
func (a Embedding) Distance(b Embedding) float64 {
    if len(a.Values) != len(b.Values) {
        return math.MaxFloat64
    }

    sum := 0.0
    for i := range a.Values {
        diff := a.Values[i] - b.Values[i]
        sum += diff * diff
    }

    return math.Sqrt(sum)
}

The Mathematics of Facial Drift

The architectural challenge occurs over time. A photo of a user at age 8 produces one embedding. A photo of that user at age 40 produces another. The mathematical distance between these two points in the vector space is massive.

The system solves this using intermediate nodes. Photos taken at ages 12, 18, 25, and 35 populate the space between them. Instead of measuring the direct distance between age 8 and age 40, the system runs a graph clustering algorithm to check for a connected path.

We can simulate this continuous drift in a pipeline:

package main

import (
    "fmt"
    "math/rand"
)

type Photo struct {
    ID    string
    Faces []Face
}

type Face struct {
    ID        string
    PhotoID   string
    Embedding Embedding
}

func main() {
    store := NewClusterStore(0.6, 0.75)

    // Simulate the same person across 10 years of photos.
    // Each embedding drifts slightly from the previous one.
    // No single step is too large, but the total drift is important.
    photos := []Photo{
        {
            ID: "photo_age_08",
            Faces: []Face{{ID: "face_001", PhotoID: "photo_age_08", Embedding: randomEmbedding(0.10, 0.05)}},
        },
        {
            ID: "photo_age_12",
            Faces: []Face{{ID: "face_002", PhotoID: "photo_age_12", Embedding: randomEmbedding(0.22, 0.05)}},
        },
        {
            ID: "photo_age_18",
            Faces: []Face{{ID: "face_003", PhotoID: "photo_age_18", Embedding: randomEmbedding(0.38, 0.05)}},
        },
        {
            ID: "photo_age_25",
            Faces: []Face{{ID: "face_004", PhotoID: "photo_age_25", Embedding: randomEmbedding(0.55, 0.05)}},
        },
        {
            ID: "photo_age_35",
            Faces: []Face{{ID: "face_005", PhotoID: "photo_age_35", Embedding: randomEmbedding(0.68, 0.05)}},
        },
        {
            ID: "photo_age_40",
            Faces: []Face{{ID: "face_006", PhotoID: "photo_age_40", Embedding: randomEmbedding(0.80, 0.05)}},
        },
    }

    fmt.Println("-- processing photos --")
    for _, photo := range photos {
        pipeline(photo, store)
    }

    fmt.Println("\n-- final clusters --")
    for id, cluster := range store.Clusters {
        fmt.Printf("cluster %s: %d faces\n", id, len(cluster.FaceIDs))
    }
}

func randomEmbedding(center, jitter float64) Embedding {
    return Embedding{
        Values: []float64{
            center + (rand.Float64()*2-1)*jitter,
        },
    }
}

Deferring Uncertain Executions and Updating Centroids

Because this is an unsupervised environment with no ground truth data, the system cannot afford false positives. When the distance between a new face and existing clusters is ambiguous, the engine does not force a merge. It creates a new, unnamed cluster and waits.

As the user uploads more photos over months or years, the cluster's mathematical center updates. This is how the system handles variance:

package main

import (
    "fmt"
    "sync"
)

type Cluster struct {
    ID       string
    FaceIDs  []string
    Centroid Embedding
}

type ClusterStore struct {
    mu                  sync.Mutex
    Clusters            map[string]*Cluster
    confidenceThreshold float64
    mergeThreshold      float64
    nextID              int
}

// Assign finds the best existing cluster for a face embedding.
// If no cluster is close enough, a new unnamed cluster is created.
func (cs *ClusterStore) Assign(face Face) string {
    cs.mu.Lock()
    defer cs.mu.Unlock()

    bestClusterID := ""
    bestDistance := cs.confidenceThreshold

    for id, cluster := range cs.Clusters {
        dist := face.Embedding.Distance(cluster.Centroid)
        if dist < bestDistance {
            bestDistance = dist
            bestClusterID = id
        }
    }

    if bestClusterID != "" {
        cs.Clusters[bestClusterID].FaceIDs = append(cs.Clusters[bestClusterID].FaceIDs, face.ID)
        cs.Clusters[bestClusterID].Centroid = cs.updateCentroid(
            cs.Clusters[bestClusterID].Centroid,
            face.Embedding,
            len(cs.Clusters[bestClusterID].FaceIDs),
        )
        return bestClusterID
    }

    newID := cs.newClusterID()
    cs.Clusters[newID] = &Cluster{
        ID:       newID,
        FaceIDs:  []string{face.ID},
        Centroid: face.Embedding,
    }
    return newID
}

// updateCentroid recalculates the average embedding after adding a new face.
// This drift is what allows the cluster to follow a face across years of change.
func (cs *ClusterStore) updateCentroid(current Embedding, incoming Embedding, totalFaces int) Embedding {
    updated := make([]float64, len(current.Values))
    weight := 1.0 / float64(totalFaces)

    for i := range current.Values {
        updated[i] = current.Values[i]*(1-weight) + incoming.Values[i]*weight
    }

    return Embedding{Values: updated}
}

Once the intermediate embeddings fill the vector space and prove the connection, the system executes the merge asynchronously. When we run this pipeline, we see the algorithm successfully grouping all the faces over the 10-year span into a single resolved cluster despite the mathematical drift.

The result is shown below:

Takeaways

Embeddings isolate mathematical structure from raw data. They are superior to direct comparative analysis.
When tracking mutating entities over time, rely on graph connectivity rather than direct pairwise distance.
In unsupervised systems, deferring an uncertain decision is better than forcing an incorrect classification.