DEV Community

Darth Espressius
Darth Espressius

Posted on • Edited on

Graph Features for Graph Machine Learning

In my previous post, we went through what a Graph is in the context of math, as well as node-based features used for machine-learning on graphs.
TLDR: a graph is a set of nodes connected by edges, both of with can contain features. Graph ML is concerned with using graph-based representations to infer these features on new graphs, or in some cases to learn structures existing within a graph.

The previous post covered features which assumed that we wanted to perform inference on the nodes or edges of a graph. Some applications however, tend to warrant entire-graph operations; For example, predicting whether a new molecule is toxic or whether a new protein is compatible with particular enzymes.

To this end, there are three common approaches:

  • Bag of nodes
  • Weisfeiler-Lehman Kernel
  • Graphlets

Bag of Nodes

This is the simplest method. Summary statistics, such as histograms, node degree, centrality measures from node-level operations can be aggregated and used as a graph-level representation. However, this is entirely based on node-level data, which means that larger, structural features of the graph may be missed.

Weisfeiler-Lehman Kernel

The idea behind this method is to iteratively aggregate node-level information, which contain data extending past their local ego graph (their immediate neighbourhood). This method can get mathsy quite quickly;

A label is assigned to each node, such as the node-degree. A hash-function then iteratively assigns each node a new label using the multi-set of current labels within the current node's neighbourhood (multi-set, since some neighbours may have the same degree). This is run a fixed number of time (depending on graph-size and how much data we wish to capture). Each node now encodes the structure of its neighbourhood, which can then be summarized for further processing.

Graphlets

This is a typically combinatoric-ally difficult problem, since it analyses different possible subgraph structures existing in a graph (called graphlets). A graphlet kernel would encode the number of times graphlets of a certain type occur in a graph, typically in a column vector. A similar approach looks at paths that occur in a graph, and encodes the number of times particular degree-sequences occur. A slight variation to this approach uses only the shortest-path between nodes which is used to encode this data.

Postmark Image

Speedy emails, satisfied customers

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay