DEV Community

Cover image for Knowledge Graph Inference with Neural Embeddings
Mage
Mage

Posted on

Knowledge Graph Inference with Neural Embeddings

Recently, some of my work involved working with knowledge graphs. I was somewhat surprised to discover how sparse resources were on working with knowledge graphs. Most of the literature was locked in research papers that are relatively inaccessible unless you have a fair amount of time on your hands.

Image description

What is a knowledge graph?

Simply put, a knowledge graph is a collection of facts, in the form of two entities and a relationship: (e1, r, e2). For instance, a representation of the concept that “Tom Cruise acted in Mission Impossible” would be represented as:
(“Tom Cruise”, 'acted_in', “Mission Impossible”)

Image descriptionPhoto credit: Mission Impossible

Here is an example knowledge graph that could represent movies, genres and actors:

Image descriptionKnowledge graph of genres, movies, and actors

Link Inference

Now that we have a knowledge graph, we may want to augment the data, and predict new relationships that should exist. In the knowledge graph above, we encode relationships for genres to movies, and movies to actors. If you squint at the knowledge graph above you can plausibly imagine a model that is able to learn that actors generally star in the same genre of movies. Tom Cruise usually stars in action movies, and Ben Stiller usually stars in comedies.
Image description
Notice in the knowledge graph above, there should exist a link between Ben Stiller and Tropic Thunder. (Ben Stiller acted in Tropic Thunder). Also note that we should be able to infer the type of genre each of these actors tend to act in. Tom Cruise tends to act in action movies, while Jack Black and Ben Stiller tend to star in comedies. From this information, we should be able to infer that there is likely an 'acted_in' relationship between Ben Stiller and Tropic Thunder.

Learning

What learning algorithm can we use to infer this information? Recall how word embeddings are trained. We can apply a similar strategy here. We can create a vector embedding for each entity and each relationship type, and train the embeddings such that
Image description

We can imagine a suitable set of embeddings look like this:
Image description

In this case, the closest movie to the vector Ben Stiller + 'acted_in' is dodgeball, but the second closest is Tropic Thunder.

Image descriptionPhoto credit: Dodgeball

As for training the model, we can try to maximize the difference between the score of of a relationship that doesn’t exist: for instance the relationship (Tom Cruise, is_genre, Ben Stiller) and a relationship that does: (Ben Stiller, 'acted_in', Dodgeball), referred to as negative sampling. This is also how word vectors are trained. Concretely:
Image description

Specifically, this encourages the positive score to be less than the negative score by some MARGIN amount. For instance a possible solution would be:

positive = 0
negative = MARGIN
loss = positive - negative + MARGIN = 0
Enter fullscreen mode Exit fullscreen mode

Lets build it

I built this in PyTorch and posted the code here. However, I’ll point out some of the more interesting aspects of the implementation that I found to help make the training more stable.

Negative Sampling

The goal of negative sampling is to produce a fact that is incorrect. For instance (“Ben Stiller”, 'acted_in', “Oblivion”). My pseudo code is as follows

def generate_negative_sample():
    while (entity_1, relation, entity_2) in graph:
        entity_1 = random_entity()
        relation = random_relation()
        entity_2 = random_entity()
Enter fullscreen mode Exit fullscreen mode

I found that learning was more stable when the negative samples were oversampled relative to the number of correct facts.

def train(fact, embedding_model, optimizer):
    # Oversample negative entries    
    avg_loss = 0
    for i in range(40):
        embedding_model.zero_grad()
        loss = embedding_model(fact)
        loss.backward()
        optimizer.step()
        avg_loss += loss.data[0]
    return avg_loss / 10
Enter fullscreen mode Exit fullscreen mode
Image descriptionTraining loss

Recall the goal of this model is to infer that Ben Stiller is more likely to have acted in Tropic Thunder over Mission Impossible or Oblivion. Here are the results (remember, according to our formulation, a lower score = more likely).

score(('ben stiller', 'acted_in', 'dodgeball')) = 1.5305
score(('ben stiller', 'acted_in', 'tropic thunder')) = 2.5801
score(('ben stiller', 'acted_in', 'mission impossible')) = 3.4038
score(('ben stiller', 'acted_in', 'oblivion')) = 3.3958
Enter fullscreen mode Exit fullscreen mode

So, this model is able to figure out that Ben Stiller is likely to have acted in Tropic Thunder, a comedy, more so than action movies, and it was able to learn this by modeling over the links in the graph.


Image descriptionPhoto credit: Oblivion

Oldest comments (0)