Deep learning through the lens of Felix Klein's Erlangen's

#programming #deeplearning #ai #nepal

Geometric deep learning by starting with the XOR affair

I’ve just spent two days at an AI school (ANAIS ) in Kathmandu listening to lectures on geometric deep learning from bright folks from Oxford, TU Wien, and top institutions. Geometric deep learning from Michael Bronstein (who also coined the term; needs no introduction really).

The program is super organised, covering symmetry groups, graph neural networks and fundamental principles underlying modern AI before it eventually catches up to modern trends. Papers. Maths. Code.

I had checked the list of confirmed speakers before I applied but wasn’t expecting them to start with the XOR problem from 1969. Marvin Minsky and Seymour Papert’s proof that simple perceptron couldn’t represent XOR.

It’s clever, it’s a good point for understanding why geometric deep learning matters.

The human element

Bronstein’s opening lecture traced an interesting path. He went back to the 1950s, to Frank Rosenblatt’s perceptron experiments, showed newspaper clippings (1958) calling it “First serious rival to the human brain..”, Minsky and Papert’s book proving perceptrons couldn’t even learn XOR.

I’ve attended a bunch of lectures over the years, and mostly it’s - “here’s a neural network, here’s what I need you to do, here’s the code”. Some treat the field like it emerged in 2012 with AlexNet.

It’s interesting that the book that killed AI funding for a decade also had the idea for Group Invariance theorem. Says - if a neural network is invariant to a group, it’s output can be expressed as functions of the orbits of that group. In English, that’s - if a function (nn) is designed to ignore how things are moved around, it still focuses on the essential part that doesn’t change. Say you pick a cat upside down. It might be hissing at you, but it’s still a cat. Rotating/ flipping the cat doesn’t change what the object is. Still the same object, albeit pretty mad at you.

Essentially, the idea behind modern geometric deep learning.

If you think about it, geometric DL is super intuitive

I understand convolutions but Bronstein’s connections are interesting:

Felix Klein’s Erlangen programme (1872): Geometry is the study of properties that are invariant under transformations.

CNNs (1980s): Image recognition is the study of features that are invariant under translations.

Dates aren’t important here (maybe they are), notice that it’s basically the same idea, just over a century apart. If you think about it, a CNN just tries to learn the “geometry” of images. Learned geometry means accurate detection.

Klein’s insight was that you can classify geometries by their symmetry groups. Euclidean geometry preserves distances and angles. Affine geometry preserves parallelism but not angles. Projective geometry preserves only intersections.

CNNs do the same thing for images. A teddy is still a teddy whether you’re holding it or hanging it upside down. Translation invariance isn’t a new trick to make neural networks better. It’s actually a geometric “prior” about how neural networks learn from images. Cool stuff. Progression makes sense.

Back to the lecture, the plan was roughly:

grids -> translation symmetry -> CNNs
graphs -> permutation symmetry -> graph neural networks
manifolds -> isometry/gauge symmetry -> geometric CNNs

Do you notice the pattern? Each architecture encodes a unique geometric prior about its input. Think about it. If it’s a cat image (a grid of pixels), you want CNNs to learn it’s a cat even if it’s upside down. In a graph of social networks, order of nodes doesn’t matter - it’s the overall shape of the graph that does. And, in a manifold (a 3D surface lets say), it shouldn’t matter whether it’s rotated, stretched or distorted as long as it’s the same object.

Fundamental yet very relevant

The ideas behind geometric networks are nothing new. It’s somehow also the current state of the art. There are obviously alterations (many; like SwiGLU instead of ReLU in some blocks), but Deepmind’s AlphaFold2 is basically a geometric model that uses equivariant graph networks. Rotational equivariance means it can learn 3D structures during protein folding. AF2 won a Nobel prize in Chemistry (2024). Quite relevant.

Erlangen programme, modernized

Bronstein’s framework is basically a modernized version of Klein’s Erlangen programme. So you’d ask what symmetries does my data have instead of asking what worked on ImageNet.

images -> translation symmetry -> CNNs
graphs-> permutation symmetry -> GNNs
3D point clouds -> rotation + permutation symmetry -> Equivariant GNNs
sequences -> time warping -> LSTMs
sets -> permutation invariance -> DeepSets/transformers

Simply put, just design models that respect and exploit geometric priors. Without considering the latter, you’d need O(ε^(-d)) samples for accuracy ε in d dimensions.

Notice how we don’t look at the architecture first.

Closing thoughts

ML is function approximation - you’re fitting a function to some data. More data, better results, but also an art in many ways. If you find the right invariances, you get data with structures, then you can build architectures that respect and utilize those priors, then you’d have models learning faster and generalizing better. Quite intuitive.

The geometric lens is an interesting way to see how CNNs are more than a clever trick with shared weights. Also interesting that it took an AI professor and slides with images of 19th century mathematicians for me to get why group invariance theorem matters.

The program was quite technical. Teodora Reu’s lecture on group theory assumed comfort with abstract algebra and Ismail Ceylan’s graph theory sessions went deep into the Weisfeiler-Lehman hierarchy. So far so good.

Footnotes

The slides have people. I’ve come to realise that I tend to ignore the bits about people in favor of the content. It has only recently dawned on me that researchers/scientists are real people. It’s also something Prof. Rajani Malla would say often.

Interesting read on the perceptron controversy

Felix Klein’s Erlangen programme (1872) in “A Comparative Review of Recent Researches in Geometry” unifies different geometries by characterizing their transformation groups. Applies to ML too. Just took us 150 years to realize that.

The Weisfeiler-Lehman test is named after Boris Weisfeiler , who disappeared in 1985 while hiking in Chile during Pinochet’s dictatorship. He never got to see it, but his contribution to graph theory lives in modern GNNs.

Kunihiko Fukushima’s Neocognitron (1980) is interesting; resembles a modern CNNs: deep architecture, local connectivity, weight sharing, pooling, ReLU. It’s a near-miss, just needed backprop.

It’s not supposed to be a detailed account (unless they’re paying me, which is unlikely), but might add interesting bits from Alisia’s sessions and more.

Writing after some work, a 9-6 day of lectures + lab, debugging, and now this post (it’s 1 am). The kind of tired where you’re not sure if the ideas are good or if you’re just sleep-deprived. We’ll see. I’ve missed lots of details and sessions by amazing people.

This post is also published on my personal site. Conflicts of interest: None. I’m not affiliated to NAAMII.