<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Antonio Filipovic</title>
    <description>The latest articles on DEV Community by Antonio Filipovic (@antoniofilipovic).</description>
    <link>https://dev.to/antoniofilipovic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F985447%2F24d35c2b-3ea9-42d1-aee4-f65f300d45ca.jpeg</url>
      <title>DEV Community: Antonio Filipovic</title>
      <link>https://dev.to/antoniofilipovic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/antoniofilipovic"/>
    <language>en</language>
    <item>
      <title>Link Prediction With node2vec in Physics Collaboration Network</title>
      <dc:creator>Antonio Filipovic</dc:creator>
      <pubDate>Fri, 16 Jun 2023 14:14:42 +0000</pubDate>
      <link>https://dev.to/memgraph/link-prediction-with-node2vec-in-physics-collaboration-network-19dh</link>
      <guid>https://dev.to/memgraph/link-prediction-with-node2vec-in-physics-collaboration-network-19dh</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;After you have successfully created a &lt;strong&gt;&lt;a href="https://memgraph.com/blog/online-node2vec-recommendation-system"&gt;dynamic recommendation system&lt;/a&gt;&lt;/strong&gt;, this time, &lt;strong&gt;MAGE&lt;/strong&gt; will teach you how to generate &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Link_prediction"&gt;link predictions&lt;/a&gt;&lt;/strong&gt; by using a new spell called &lt;strong&gt;node2vec&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you don't know what &lt;strong&gt;node2vec&lt;/strong&gt; is or what &lt;strong&gt;node embeddings&lt;/strong&gt; are, we got you covered with &lt;strong&gt;two&lt;/strong&gt; blog posts for deeper understanding:&lt;/p&gt;

&lt;p&gt;1) &lt;a href="https://memgraph.com/blog/introduction-to-node-embedding"&gt;&lt;em&gt;Introduction to node embedding&lt;/em&gt;&lt;/a&gt; - In this article, you can check out what node embeddings are, where we use them, why we use them, and how we can get embeddings from a graph.&lt;br&gt;
2) &lt;a href="https://memgraph.com/blog/how-node2vec-works"&gt;&lt;em&gt;How node2vec works&lt;/em&gt;&lt;/a&gt; -  After the first blog post, you should have an idea of how &lt;strong&gt;node2vec&lt;/strong&gt; works. But if you want to fully understand the algorithm, its benefits and check out how it works on a few examples, take a look at this node2vec blog post which covers everything mentioned. &lt;/p&gt;

&lt;p&gt;As already mentioned, &lt;strong&gt;link prediction&lt;/strong&gt; refers to the task of predicting missing links or links that are likely to occur in the future. In this tutorial, we will make use the of &lt;strong&gt;MAGE&lt;/strong&gt; spell called &lt;a href="https://memgraph.com/docs/mage/query-modules/python/node2vec"&gt;&lt;strong&gt;node2vec&lt;/strong&gt;&lt;/a&gt;. Also, we will use &lt;strong&gt;Memgraph&lt;/strong&gt; to store data, and &lt;a href="https://github.com/memgraph/gqlalchemy"&gt;&lt;strong&gt;gqlalchemy&lt;/strong&gt;&lt;/a&gt; to connect from a Python application. The dataset will be similar to the one used in this paper: &lt;strong&gt;&lt;a href="https://arxiv.org/pdf/1705.02801.pdf"&gt;Graph Embedding Techniques, Applications, and Performance: A Survey&lt;/a&gt;.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Don't worry, you are in safe hands, &lt;strong&gt;MAGE&lt;/strong&gt; will guide you through dataset parsing, the creation of queries that will be used to import data into Memgraph, &lt;strong&gt;embeddings calculation&lt;/strong&gt; with the &lt;strong&gt;node2vec&lt;/strong&gt; algorithm in MAGE, and metrics report.&lt;/p&gt;

&lt;p&gt;Now let's get to the fun part.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;For this to work, you will need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;strong&gt;&lt;a href="https://memgraph.com/docs/mage/installation"&gt;MAGE graph library&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://memgraph.com/product/lab"&gt;Memgraph Lab&lt;/a&gt;&lt;/strong&gt; - the graph explorer for querying Memgraph and visualizing graphs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/memgraph/gqlalchemy"&gt;gqlalchemy&lt;/a&gt;&lt;/strong&gt; - a Python driver and object graph mapper (OGM)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can also try out &lt;strong&gt;MAGE&lt;/strong&gt; on &lt;a href="https://playground.memgraph.com/lesson/game-of-thrones-deaths-introductions-3-mage?step=intro"&gt;Memgraph Playground&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Contents
&lt;/h2&gt;

&lt;p&gt;This is how we will set up our tutorial:&lt;br&gt;
1) Dataset and query import&lt;br&gt;
2) Splitting edges into test and train sets&lt;br&gt;
3) Run node2vec on the train set to generate node embeddings&lt;br&gt;
4) Get potential edges from embeddings&lt;br&gt;
5) Rank potential edges to get top K predictions&lt;br&gt;
6) Compare predicted edges with the test set&lt;/p&gt;
&lt;h2&gt;
  
  
  1. Dataset and query import
&lt;/h2&gt;

&lt;p&gt;We will work on the &lt;a href="http://snap.stanford.edu/data/ca-HepPh.html"&gt;High Energy Physics Collaboration Network&lt;/a&gt;. The dataset contains &lt;strong&gt;12008&lt;/strong&gt; nodes and &lt;strong&gt;118521&lt;/strong&gt; edges. &lt;strong&gt;MAGE&lt;/strong&gt; has prepared a script that will help you parse the dataset and import it into &lt;strong&gt;Memgraph&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;After you have downloaded the dataset from the link above, you should see the following contents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Directed graph (each unordered pair of nodes is saved once): CA-HepPh.txt 
# Collaboration network of Arxiv High Energy Physics category (there is an edge if authors co-authored at least one paper)
# Nodes: 12008 Edges: 237010
# FromNodeId    ToNodeId
17010   1943
17010   2489
17010   3426
17010   4049
17010   16961
17010   17897
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The dataset description says it's a directed graph and that it contains 237010 edges. But earlier we mentioned it contains 118521 edges. Actually, both are true. Depends on your view. &lt;/p&gt;

&lt;p&gt;The graph in question is &lt;strong&gt;directed&lt;/strong&gt;, but it contains edges in both directions: from node &lt;em&gt;u&lt;/em&gt; to node &lt;em&gt;v&lt;/em&gt; and from node &lt;em&gt;v&lt;/em&gt; to node &lt;em&gt;u&lt;/em&gt;, &lt;em&gt;u⟶v&lt;/em&gt; and &lt;em&gt;u⟵v&lt;/em&gt;. The direction means that author &lt;em&gt;u&lt;/em&gt; co-authored at least one paper with author &lt;em&gt;v&lt;/em&gt;. Since co-authoring goes both ways we can act as if the graph is undirected with only one edge, &lt;em&gt;u - v&lt;/em&gt;.  The script below will create exactly 118521 undirected edges. So all is good. Phew. &lt;/p&gt;

&lt;p&gt;We will import these &lt;strong&gt;118521&lt;/strong&gt; edges, and act as if they are undirected. &lt;a href="https://github.com/memgraph/mage/blob/main/python/node2vec.py"&gt;The &lt;strong&gt;Node2Vec&lt;/strong&gt; algorithm in MAGE&lt;/a&gt; accepts parameters whether to treat graph from &lt;strong&gt;Memgraph&lt;/strong&gt; as directed or undirected. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: &lt;strong&gt;Memgraph&lt;/strong&gt; only accepts directed graphs, but the &lt;strong&gt;Node2Vec&lt;/strong&gt; algorithm saves the day for us in this case.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here is the function to parse edges from the file. It will return a &lt;code&gt;List&lt;/code&gt; of &lt;code&gt;int Tuples&lt;/code&gt;, which will represent &lt;strong&gt;undirected&lt;/strong&gt; edges.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;FILENAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"CA-HepPh.txt"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_edges_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;FILENAME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"#"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;line_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line_parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line_parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We need to create &lt;a href="https://memgraph.com/docs/cypher-manual/"&gt;Cypher&lt;/a&gt; queries from the given &lt;strong&gt;undirected&lt;/strong&gt; edges. If you don't know anything about Cypher, here is a short &lt;a href="https://memgraph.com/docs/cypher-manual/reading-existing-data"&gt;getting started&lt;/a&gt; guide. You can also learn a lot about graph algorithms and Cypher queries on &lt;a href="https://playground.memgraph.com/"&gt;Memgraph Playground&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We need to &lt;strong&gt;create queries&lt;/strong&gt; from edges so that we can run each query and import data into &lt;strong&gt;Memgraph&lt;/strong&gt;.Let's use the &lt;code&gt;MERGE&lt;/code&gt; clause, which ensures that a pattern we are looking for will exist only once in the database after a query is run. That means that if the pattern (node or edge) is not found, it will be created.&lt;/p&gt;

&lt;p&gt;Now, let's create the queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;NODE_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Collaborator"&lt;/span&gt;
&lt;span class="n"&gt;EDGE_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"COLLABORATED_WITH"&lt;/span&gt;

&lt;span class="n"&gt;edge_template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;'MERGE (a:$node_name_a {id: $id_a}) MERGE (b:$node_name_b {id: $id_b}) CREATE (a)-[:$edge_name]-&amp;gt;(b);'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;



&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]]):&lt;/span&gt;
    &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"CREATE INDEX ON :{node_name}(id);"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NODE_NAME&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edge_template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;substitute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id_a&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                &lt;span class="n"&gt;id_b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                &lt;span class="n"&gt;node_name_a&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;NODE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                &lt;span class="n"&gt;node_name_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;NODE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                &lt;span class="n"&gt;edge_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;EDGE_NAME&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parse_edges_dataset&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;create_queries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nb"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OUTPUT_FILE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'w'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;'__main__'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function &lt;code&gt;create_queries()&lt;/code&gt; will return a list of strings. Each string represents a query we can run against our database.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: you can import datasets through one of the querying tools. We have developed our drivers using the &lt;a href="https://memgraph.com/blog/memgraph-1-2-release-implementing-the-bolt-protocol-v4"&gt;Bolt protocol&lt;/a&gt; to deliver better performance. You can use &lt;a href="https://memgraph.com/product/lab"&gt;&lt;strong&gt;Memgraph Lab&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://github.com/memgraph/mgconsole"&gt;&lt;strong&gt;mgconsole&lt;/strong&gt;&lt;/a&gt; or even one of our drivers, like the &lt;strong&gt;Python driver&lt;/strong&gt; used in this tutorial.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We recommend you use &lt;a href="https://memgraph.com/product/lab"&gt;&lt;strong&gt;Memgraph Lab&lt;/strong&gt;&lt;/a&gt; due to the simple visualization, ease of use, export and import features, and memory usage. But here, we will use a Python driver in the form of &lt;strong&gt;gqlalchemy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Ic6w8Bni--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-memgraph-lab-interface.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Ic6w8Bni--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-memgraph-lab-interface.png" alt="memgraph-tutorial-memgraph-lab-interface" width="800" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt; Image 1. &lt;a href="https://memgraph.com/product/lab"&gt;Memgraph Lab&lt;/a&gt; interface &lt;/center&gt;



&lt;h2&gt;
  
  
  2. Splitting edges into test and train sets
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Theory
&lt;/h3&gt;

&lt;p&gt;First, we need to split our edges into a testing (&lt;strong&gt;test&lt;/strong&gt;) and training (&lt;strong&gt;train&lt;/strong&gt;) set. Let's explain why.&lt;/p&gt;

&lt;p&gt;Our goal is to perform &lt;strong&gt;link prediction&lt;/strong&gt;. This means that we need to be able to correctly predict new edges that might appear from existing ones. Since this is a definitive dataset, there will be no new edges. In order to test the algorithm we remove a part of the existing edges and make predictions based on the remaining ones. A correct prediction would recreate the edges we have removed. In the best case scenario, we would get the original dataset.&lt;/p&gt;

&lt;p&gt;We will randomly &lt;strong&gt;remove&lt;/strong&gt; &lt;strong&gt;20%&lt;/strong&gt; percent of edges. This will represent our &lt;strong&gt;test set&lt;/strong&gt;. We will leave &lt;strong&gt;all&lt;/strong&gt; the nodes in the graph, it doesn't matter that some of them could be completely disconnected from the graph. Next, we will run &lt;strong&gt;node2vec&lt;/strong&gt; on the remaining edges (&lt;strong&gt;80%&lt;/strong&gt; of them, in our case that would be something like 94000 edges) to get node embeddings. We will use these node embeddings to &lt;strong&gt;predict new edges&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can imagine this case as a &lt;strong&gt;Twitter web&lt;/strong&gt;, where new connections (follows) appear every second, and we want to be able to predict new connections from connections we already have. &lt;/p&gt;

&lt;p&gt;How exactly we will predict which edges will appear is still left to explain, but we hope that you understand the &lt;strong&gt;WHY&lt;/strong&gt; part of removing 20% of the edges. 🤞&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical
&lt;/h3&gt;

&lt;p&gt;Firstly, we need a connection to Memgraph so we can get edges, split them into two parts (train set and test set). For edge splitting, we will use &lt;a href="https://scikit-learn.org/"&gt;&lt;strong&gt;scikit-learn&lt;/strong&gt;&lt;/a&gt;. In order to make a connection towards &lt;strong&gt;Memgraph&lt;/strong&gt;, we will use &lt;a href="https://github.com/memgraph/gqlalchemy"&gt;gqlalchemy&lt;/a&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;From &lt;strong&gt;&lt;a href="https://github.com/memgraph/gqlalchemy"&gt;GitHub description of gqlalchemy&lt;/a&gt;&lt;/strong&gt;:&lt;br&gt;
"GQLAlchemy is a library developed to assist in writing and running queries on Memgraph. GQLAlchemy supports high-level connection to Memgraph as well as modular query builder."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And after we create a connection towards Memgraph, we will call these two functions down below in order to run a query. This query can be anything from getting edges, removing edges, running a &lt;strong&gt;node2vec&lt;/strong&gt; procedure.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memgraph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Memgraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"127.0.0.1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7687&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_a_query_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Iterator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call_a_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Okay, so to get edges we need to make a query. With the connection we have, we will get edges, split them into two sets, and then make queries (&lt;strong&gt;plural&lt;/strong&gt;) to remove each one of them in the test set from the graph.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;edge_remove_template&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;'MATCH (a:$node_a_name{id: $node_a_id})-[edge]-(b:$node_b_name{id: $node_b_id}) DELETE edge;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_all_edges&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;]]:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; \
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NODE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"node_a"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;EDGE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"edge"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NODE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"node_b"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"node_a"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"node_b"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;remove_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;edge_remove_template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;substitute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node_a_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dataset_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NODE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                               &lt;span class="n"&gt;node_a_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                                               &lt;span class="n"&gt;node_b_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dataset_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NODE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                               &lt;span class="n"&gt;node_b_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;call_a_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;split_edges_train_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt;&lt;span class="p"&gt;]]):&lt;/span&gt;
    &lt;span class="n"&gt;edges_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edges_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;train_test_split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;random_state&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;edges_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edges_test&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will be the "main" part of our program. We want you to notice a few things from here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When getting all edges with a query, instead of edge object we got two nodes (&lt;code&gt;gqlalchemy.Vertex&lt;/code&gt; object), one represents the head and the other represents the tail of the edge, but we will treat it as an &lt;strong&gt;undirected graph&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;split_edges_train_test()&lt;/code&gt; function accepts these edges and splits them into a train and test set.&lt;/li&gt;
&lt;li&gt;We received an object, but it will be easier to work with the &lt;code&gt;id&lt;/code&gt; property of the node We will just map from our list of edges, to a list of int tuples, where one pair will represent an undirected &lt;strong&gt;edge&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Getting all edges..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;edges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_all_edges&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Current number of edges is {}"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Splitting edges in train, test group..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;edges_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;edges_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;split_edges_train_test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;edges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Splitting edges done."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Removing edges from graph."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;remove_edges&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edges_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Edges removed."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;train_edges_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="n"&gt;node_from&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;node_to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node_from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_to&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges_train&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;test_edges_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="n"&gt;node_from&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;node_to&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node_from&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;node_to&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges_test&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Run node2vec on the train set to generate node embeddings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Theory
&lt;/h3&gt;

&lt;p&gt;After we have removed edges, we need to run the &lt;strong&gt;node2vec&lt;/strong&gt; algorithm. Node embeddings will be calculated just from a train set of edges. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Repeat&lt;/strong&gt;: we will get embeddings for every node, but for that, we will only use a certain amount of edges (80%) from the original graph. If a new node was to appear in the graph, we can't predict anything for that node, since we don't know it exists yet. We can only make predictions for the nodes we have in the graph. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Practical
&lt;/h3&gt;

&lt;p&gt;Here we will call the &lt;strong&gt;node2vec &lt;a href="https://memgraph.com/docs/memgraph/reference-guide/query-modules"&gt;query module&lt;/a&gt;&lt;/strong&gt; to calculate node embeddings. There is a procedure called &lt;code&gt;set_embeddings()&lt;/code&gt; in the &lt;code&gt;node2vec&lt;/code&gt; module, which we will use to set embeddings in the graph as properties. So even if we lose power on the computer, we will still have those embeddings, since &lt;strong&gt;Memgraph&lt;/strong&gt; acts as an in-memory database.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Node2Vec&lt;/strong&gt; has some crucial hyperparameters like &lt;code&gt;num_walks&lt;/code&gt; and &lt;code&gt;walk_length&lt;/code&gt;. When we set them on higher value, they will cause the algorithm to run longer, but we should get &lt;strong&gt;better predictions&lt;/strong&gt; if embeddings don't overfit to data we have.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--gkFvPe0y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-hyperparameters.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--gkFvPe0y--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-hyperparameters.jpg" alt="memgraph-tutorial-hyperparameters" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt; Image 2. The algorithm's results are dependant on how we set our &lt;a href="https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)"&gt;hyperparameters&lt;/a&gt;
&lt;/center&gt;



&lt;p&gt;Another problem we need to handle is to set proper &lt;code&gt;p&lt;/code&gt; and &lt;code&gt;q&lt;/code&gt; parameters. Since we are dealing here with a collaboration network, we will try to predict connections inside natural clusters. We can obtain clusters by sampling walks in more DFS like manner. If all these terms sound confusing to you, we would suggest checking out the blog post on &lt;strong&gt;&lt;a href="https://memgraph.com/blog/how-node2vec-works"&gt;node2vec&lt;/a&gt;&lt;/strong&gt; where we have explained those terms. 💪 &lt;/p&gt;

&lt;p&gt;If we were to set &lt;strong&gt;node2vec&lt;/strong&gt; params in a more &lt;strong&gt;BFS&lt;/strong&gt; manner, so that hyperparameter &lt;code&gt;p&lt;/code&gt; is smaller than hyperparameter &lt;code&gt;q&lt;/code&gt;, then we would be looking for &lt;strong&gt;hubs&lt;/strong&gt;, which isn't our intention.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# NODE2VEC PARAMS
&lt;/span&gt;&lt;span class="n"&gt;is_directed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;# return parameter
&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;256&lt;/span&gt;  &lt;span class="c1"&gt;# in-out parameter
&lt;/span&gt;&lt;span class="n"&gt;num_walks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
&lt;span class="n"&gt;walk_length&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;80&lt;/span&gt;
&lt;span class="n"&gt;vector_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.02&lt;/span&gt;
&lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;min_count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;workers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
&lt;span class="n"&gt;min_alpha&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.0001&lt;/span&gt;
&lt;span class="n"&gt;sg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;hs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="n"&gt;negative&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_node_embeddings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;call_a_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"""CALL node2vec.set_embeddings({is_directed},{p}, {q}, {num_walks}, {walk_length}, {vector_size}, 
    {alpha}, {window}, {min_count}, {seed}, {workers}, {min_alpha}, {sg}, {hs}, {negative}) YIELD *"""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;is_directed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;is_directed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_walks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_walks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;walk_length&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;walk_length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vector_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vector_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;min_count&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;workers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_alpha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;min_alpha&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;hs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;hs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;negative&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;negative&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_embeddings_as_properties&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Match&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; \
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NODE_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Node&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="s"&gt;"embedding"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And this is our main part. After the &lt;strong&gt;node2vec query module&lt;/strong&gt; finishes with calculations, we can get those embeddings directly from the graph, which is awesome.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;test_edges_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;edges_test&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Calculate and get node embeddings
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Setting node embeddings as graph property..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;set_node_embeddings&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Embedding for every node set."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;node_emeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_embeddings_as_properties&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Get potential edges from embeddings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Theory
&lt;/h3&gt;

&lt;p&gt;And now to the most important section ⟶ &lt;strong&gt;edge prediction&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How do we predict edges exactly? What is the idea behind it?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We expect nodes that have similar embeddings and still don't have an edge between them to form a new edge in the future. It's as simple as that. &lt;/p&gt;

&lt;p&gt;We just need to find a good measure to be able to check whether two nodes have similar embeddings. One such measure is &lt;strong&gt;cosine similarity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CXrUvWxO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-cosine-similarity.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CXrUvWxO--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-cosine-similarity.png" alt="memgraph-tutorial-cosine-similarity" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt; Image 3. Cosine similarity between two vectors A and B&lt;/center&gt;



&lt;p&gt;The Image 3 above contains an explanation of cosine similarity, the measure that will calculate how similar two vectors are. It's essentially the &lt;strong&gt;cosine angle&lt;/strong&gt; between two vectors. Notice that node embeddings also represent vectors in &lt;strong&gt;multi-dimensional space&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Practical
&lt;/h3&gt;

&lt;p&gt;So for every pair of node embeddings, we will calculate the cosine similarity to check how similar two-node embeddings are. The problem with 12000 nodes is that there will be around 72 million pairs (72 000 000), which means that an  average computer with 16GB of RAM would die at some point (open up a Chrome tab if you dare). To fix that, we will only hold a maximum of 2 million pairs in memory at any given point in time. We will also run a sorting algorithm to only keep the top &lt;em&gt;K&lt;/em&gt; pairs. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What would be this top K number?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We will answer this question shortly and it's related to the &lt;strong&gt;&lt;em&gt;precision@K&lt;/em&gt;&lt;/strong&gt; measurement method.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_adjacency_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;List&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_edge_weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;j&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="n"&gt;nodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;nodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;adj_mtx_r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="n"&gt;cnt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pair&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;itertools&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;combinations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nodes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cnt&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;1000000&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;adj_mtx_r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;adj_mtx_r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])}&lt;/span&gt;
            &lt;span class="n"&gt;adj_mtx_r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;adj_mtx_r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;adj_mtx_r&lt;/span&gt;&lt;span class="p"&gt;)[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;PRECISION_AT_K_CONST&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
            &lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cnt&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cnt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_edge_weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;weight&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;cnt&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="n"&gt;adj_mtx_r&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;get_edge_weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;adj_mtx_r&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Rank potential edges to get top K predictions
&lt;/h2&gt;

&lt;p&gt;To calculate the accuracy of our implementation, we will use a famous precision method called &lt;a href="https://stackoverflow.com/questions/55748792/understanding-precisionk-apk-mapk"&gt;&lt;strong&gt;&lt;em&gt;precision@K&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Some nodes (their embeddings) will be more similar than others, meaning the &lt;strong&gt;cosine similarity&lt;/strong&gt; value will be larger. And let's say our manager arrives and says, give me the top 10 predictions. Would you give him pairs with lower or higher similarities? Probably the best ones.&lt;/p&gt;

&lt;p&gt;The same principle can be applied here. We will take the top K predictions, and evaluate our model. At every point, we will remember how many guesses we had until then. And we will divide the number of our guesses by the number of tries we had until then.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--G-AQR00i--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-precision-k-method.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--G-AQR00i--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-precision-k-method.png" alt="memgraph-tutorial-precision-k-method" width="800" height="187"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;Image 4. Example of &lt;b&gt;precision@K&lt;/b&gt; method &lt;/center&gt;



&lt;p&gt;This is how it would work for &lt;strong&gt;&lt;em&gt;precision@6&lt;/em&gt;&lt;/strong&gt;:&lt;br&gt;
The first one is easy: 1 guess / 1 try. For the second one, we have: 1 guess / 2 tries. The rest is clear.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;compute_precision_at_k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicted_edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                           &lt;span class="n"&gt;test_edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;max_k&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;precision_scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;  &lt;span class="c1"&gt;# precision at k
&lt;/span&gt;    &lt;span class="n"&gt;delta_factors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;correct_edge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;predicted_edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;max_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="c1"&gt;# if our guessed edge is really in graph
&lt;/span&gt;        &lt;span class="c1"&gt;# this is due representation problem: (2,1) edge in undirected graph is saved in memory as (2,1)
&lt;/span&gt;        &lt;span class="c1"&gt;# but in adj matrix it is calculated as (1,2)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_edges&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_edges&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;correct_edge&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="n"&gt;delta_factors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;delta_factors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;precision_scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;correct_edge&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# (number of correct guesses) / (number of attempts)
&lt;/span&gt;        &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;precision_scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delta_factors&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the main part of the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# Calculate adjacency matrix
&lt;/span&gt;    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calculating adjacency matrix from embeddings."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;adj_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;calculate_adjacency_matrix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;node_emeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Adjacency matrix calculated"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# print(adj_matrix)
&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Getting predicted edges..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;predicted_edge_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;adj_matrix&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Predicted edge list is of length:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicted_edge_list&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Sorting predicted edge list"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# We need to sort predicted edges so that ones that are most likely to appear are first in list
&lt;/span&gt;    &lt;span class="n"&gt;sorted_predicted_edges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicted_edge_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])}&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Predicted edges sorted..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Filtering predicted edges that are not in train list..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# taking only edges that we are predicting to appear, not ones that are already in the graph
&lt;/span&gt;    &lt;span class="n"&gt;sorted_predicted_edges&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sorted_predicted_edges&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;train_edges_dict&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="c1"&gt;# print(sorted_predicted_edges)
&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Calculating precision@k..."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;precision_scores&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delta_factors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;compute_precision_at_k&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predicted_edges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sorted_predicted_edges&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                             &lt;span class="n"&gt;test_edges&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;test_edges_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                                                             &lt;span class="n"&gt;max_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PRECISION_AT_K_CONST&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"precision score"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;precision_scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"../results.txt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'a+'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;precision&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;precision&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;precision_scores&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Compare predicted edges with the test set
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3sTkKh5D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-precision-k-graph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3sTkKh5D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-precision-k-graph.png" alt="memgraph-tutorial-precision-k-graph" width="800" height="593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;Image 5. The graph of &lt;b&gt;precision@k&lt;/b&gt; in our example&lt;/center&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;matplotlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="c1"&gt;#tribute to https://stackoverflow.com/questions/12957582/plot-yerr-xerr-as-shaded-region-rather-than-error-bars
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;'../results.txt'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'r'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;


&lt;span class="n"&gt;parsed_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;" "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parsed_list&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;parsed_results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed_list&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;stddev&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axis&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;parsed_results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mean&lt;/span&gt;
&lt;span class="n"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stddev&lt;/span&gt;


&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;'k-'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fill_between&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After running our code a couple of times, we can plot our results. Since we didn't take any features into account and only worked with the graph structure when doing link prediction, we can say that our results are good. It can be a lot better, but for 16 edges we have a precision of around 70%. &lt;strong&gt;MAGE&lt;/strong&gt; is satisfied at the moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;So that's it for the real-time link prediction tutorial. Hope that you learned something and that we got you interested in graph analytics even more. If you got lost during the tutorial at any point, here is a link to the &lt;strong&gt;&lt;a href="https://github.com/memgraph/link-prediction-node-embeddings"&gt;GitHub repository for link prediction with MAGE&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Our team of engineers is currently tackling the problem of graph analytics algorithms on &lt;strong&gt;real-time data&lt;/strong&gt;. If you want to discuss how to apply &lt;strong&gt;online/streaming algorithms&lt;/strong&gt; on connected data, feel free to join our &lt;strong&gt;&lt;a href="https://memgr.ph/join-discord"&gt;Discord server&lt;/a&gt;&lt;/strong&gt; and message us.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAGE&lt;/strong&gt; shares his wisdom on a &lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage"&gt;&lt;strong&gt;Twitter&lt;/strong&gt; account&lt;/a&gt;. Get to know him better by following him 🐦&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--c_PN5axd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://public-assets.memgraph.com/link-prediction-with-node2vec-in-physics-collaboration-network/memgraph-tutorial-link-prediction-twitter.jpg" alt="memgraph-tutorial-link-prediction-twitter" width="800" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Last but not least, check out &lt;a href="https://github.com/memgraph/mage"&gt;&lt;strong&gt;MAGE&lt;/strong&gt;&lt;/a&gt; and don’t hesitate to give a star ⭐ or contribute with new ideas.&lt;/p&gt;

</description>
      <category>node2vec</category>
      <category>algorithms</category>
      <category>graphdatabase</category>
      <category>memgraph</category>
    </item>
    <item>
      <title>Recommendation System Using Online Node2Vec With Memgraph MAGE</title>
      <dc:creator>Antonio Filipovic</dc:creator>
      <pubDate>Thu, 01 Jun 2023 11:22:25 +0000</pubDate>
      <link>https://dev.to/memgraph/recommendation-system-using-online-node2vec-with-memgraph-mage-7fi</link>
      <guid>https://dev.to/memgraph/recommendation-system-using-online-node2vec-with-memgraph-mage-7fi</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;The online node2vec algorithm learns and updates temporal node embeddings on the fly for tracking and measuring node similarity over time in graph streams. &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Our little magician Memgraph MAGE has recently received one more spell - the &lt;a href="https://memgraph.com/docs/mage/query-modules/python/node2vec-online" rel="noopener noreferrer"&gt;&lt;strong&gt;Online Node2Vec algorithm&lt;/strong&gt;&lt;/a&gt;. Since he is still too scared to use it, you, as a brave spirit, will step up and use it on a real challenge to show MAGE how it's done. This challenge includes building an &lt;strong&gt;Online Recommendation System&lt;/strong&gt; using k-means clustering and the newborn spell - Online Node2Vec algorithm.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To complete this tutorial, you will need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An installation of &lt;a href="https://memgraph.com/mage" rel="noopener noreferrer"&gt;Memgraph Advanced Graph Extensions (MAGE)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;An installation of &lt;a href="https://memgraph.com/product/lab" rel="noopener noreferrer"&gt;Memgraph Lab&lt;/a&gt;  or usage of Memgraph's command-line tool, &lt;a href="https://docs.memgraph.com/memgraph/connect-to-memgraph/methods/mgconsole/" rel="noopener noreferrer"&gt;mgconsole&lt;/a&gt;, which is installed together with Memgraph.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: In short, you could have one of the following setups: &lt;br&gt;
1) Memgraph installed (not the Docker version), and Memgraph MAGE built from source&lt;br&gt;
2) The Memgraph MAGE Docker image&lt;br&gt;
3) The Memgraph Docker image, but you have to additionally copy the MAGE directory inside the container, run &lt;code&gt;python build&lt;/code&gt; and copy the created &lt;code&gt;mage/dist&lt;/code&gt; to &lt;code&gt;/usr/lib/memgraph/query_modules&lt;/code&gt; so Memgraph can access it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To check that you have everything ready for the next steps, use the following command in &lt;code&gt;mgconsole&lt;/code&gt; or &lt;code&gt;Memgraph Lab&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;node2vec_online.help&lt;/span&gt;&lt;span class="ss"&gt;()&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Graph
&lt;/h2&gt;

&lt;p&gt;You will use the spell on the &lt;a href="https://snap.stanford.edu/data/cit-HepPh.html" rel="noopener noreferrer"&gt;High-energy physics citation network&lt;/a&gt;. The already processed &lt;a href="https://download.memgraph.com/datasets/physics-citation-network/physics-citation-network.cypherl.gz" rel="noopener noreferrer"&gt;dataset&lt;/a&gt; contains 395 papers - nodes and 1106 citations - edges. If a paper &lt;code&gt;i&lt;/code&gt; cites paper &lt;code&gt;j&lt;/code&gt;, the graph contains a directed edge from &lt;code&gt;i&lt;/code&gt; to &lt;code&gt;j&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;Below is the graph schema:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fonline-node2vec-recommendation-system%2Fmemgraph-tutorial-schema-online-node2vec-recommendation-system.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fonline-node2vec-recommendation-system%2Fmemgraph-tutorial-schema-online-node2vec-recommendation-system.jpg" alt="memgraph-tutorial-schema-online-node2vec-recommendation-system"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is only one type of node in our graph schema. MAGE is happy to hear this kind of news since he believes that this way, his spell will give you the best results.&lt;/p&gt;

&lt;p&gt;Before importing the dataset, MAGE wants you to read the instructions about the spell to learn how to use it properly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Online Node2Vec
&lt;/h2&gt;

&lt;p&gt;In the &lt;a href="https://appliednetsci.springeropen.com/articles/10.1007/s41109-019-0169-5" rel="noopener noreferrer"&gt;MAGE instructions&lt;/a&gt;, there is a note that researchers have shown how the &lt;code&gt;Node2Vec Online&lt;/code&gt; spell creates similar embeddings for two nodes (e.g. &lt;code&gt;v&lt;/code&gt; and &lt;code&gt;u&lt;/code&gt;) if there is an option to reach one node from the other across edges that appeared recently. In other words, the embedding of a node &lt;code&gt;v&lt;/code&gt; should be more similar to the embedding of node &lt;code&gt;u&lt;/code&gt;  if we can reach &lt;code&gt;u&lt;/code&gt; by taking steps backward to node &lt;code&gt;v&lt;/code&gt; across edges that appeared before the previous one. These steps backward from one node to the other form a temporal walk. It is temporal since it depends on when the edge appeared in the graph.&lt;/p&gt;

&lt;p&gt;To make two nodes more similar and to create these temporal walks, the &lt;code&gt;Node2Vec Online&lt;/code&gt; spell uses the &lt;code&gt;StreamWalk updater&lt;/code&gt; and &lt;code&gt;Word2Vec learner&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;&lt;code&gt;StreamWalk updater&lt;/code&gt; is a machine for sampling temporal walks. A sampling of the walk is done in a backward fashion because we look only at the incoming edges of the node. Since one node can have multiple incoming edges, when sampling a walk, &lt;code&gt;StreamWalk updater&lt;/code&gt; uses probabilities to determine which incoming edge of the node it will take next, and that way leading to a new node. These probabilities are computed after the edge arrives and before temporal walk sampling. Probability represents a sum over all temporal walks &lt;code&gt;z&lt;/code&gt; ending in node &lt;code&gt;v&lt;/code&gt; using edges arriving no later than the latest one of already sampled ones in the temporal walk. When the algorithm decides which edge to take next for temporal walk creation, it uses these computed weights (probabilities). Every time a new edge appears in the graph, these probabilities are updated just for two nodes of a new edge.&lt;/p&gt;

&lt;p&gt;After walks sampling, &lt;code&gt;Word2Vec learner&lt;/code&gt; uses these prepared temporal walks to make node embeddings more similar using the &lt;code&gt;gensim Word2Vec&lt;/code&gt; module. These sampled walks are given as sentences to the &lt;code&gt;gensim Word2Vec&lt;/code&gt; module, which then optimizes for the similarity of the node embeddings in the walk with stochastic gradient descent using a skip-gram model or continuous-bag-of-words (CBOW).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: &lt;a href="https://github.com/memgraph/mage/tree/main/python/mage/node2vec_online_module" rel="noopener noreferrer"&gt;this implementation&lt;/a&gt; contains &lt;a href="https://github.com/ferencberes/online-node2vec" rel="noopener noreferrer"&gt;the code&lt;/a&gt; related to the research of Ferenc Béres, Róbert Pálovics, Domokos Miklós Kelen and András A. Benczúr&lt;/p&gt;
&lt;h2&gt;
  
  
  Node2Vec Online setup
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;From MAGE's instructions, you see now a piece of advice to use &lt;a href="https://docs.memgraph.com/memgraph/database-functionalities/triggers/" rel="noopener noreferrer"&gt;Memgraph Triggers&lt;/a&gt; for this spell to work. You can create a trigger that fires up on edge creation. Every time there is a new edge which we add in the graph, trigger fires and calls &lt;strong&gt;Node2Vec Online&lt;/strong&gt; algorithm to update node embeddings.&lt;/p&gt;

&lt;p&gt;To create the trigger, you can use the following command in &lt;code&gt;Memgraph Lab&lt;/code&gt; or &lt;code&gt;mgconsole&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;TRIGGER&lt;/span&gt; &lt;span class="n"&gt;trigger&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="o"&gt;--&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;BEFORE&lt;/span&gt; &lt;span class="k"&gt;COMMIT&lt;/span&gt;
&lt;span class="n"&gt;EXECUTE&lt;/span&gt; &lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;node2vec_online.update&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;createdEdges&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before you start with importing the dataset, there is a big message on MAGE's instructions not to forget to set parameters for &lt;strong&gt;StreamWalk updater&lt;/strong&gt; and &lt;strong&gt;Word2Vec learner&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;StreamWalk updater&lt;/strong&gt; uses the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;half_life&lt;/code&gt;: half-life [seconds], used in the temporal walk probability calculation

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;max_length&lt;/code&gt;: maximum length of the sampled temporal random walks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;beta&lt;/code&gt;: damping factor for long paths&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cutoff&lt;/code&gt;: temporal cutoff in seconds to exclude very distant past&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sampled_walks&lt;/code&gt;:  number of sampled walks for each edge update&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;full_walks&lt;/code&gt;: return every node of the sampled walk (True) or only the endpoints of the walk (False)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;All these parameters are for temporal walk sampling. And you can set now the parameters with the following command in &lt;code&gt;Memgraph Lab&lt;/code&gt; or &lt;code&gt;mgconsole&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;node2vec_online.set_streamwalk_updater&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7200&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;604800&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="k"&gt;True&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Word2Vec learner&lt;/strong&gt; uses the following parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;embedding_dimension&lt;/code&gt;: number of dimensions in the representation of the embedding vector&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;learning_rate&lt;/code&gt;: learning rate&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;skip_gram&lt;/code&gt;: use skip-gram model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;negative_rate&lt;/code&gt;: negative rate for skip-gram model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;threads&lt;/code&gt;: maximum number of threads for parallelization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These parameters are mostly used in the &lt;a href="http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/" rel="noopener noreferrer"&gt;skip-gram model&lt;/a&gt;.&lt;br&gt;
You can set the parameters with the following command in &lt;code&gt;Memgraph Lab&lt;/code&gt; or &lt;code&gt;mgconsole&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;node2vec_online.set_word2vec_learner&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;&lt;span class="mf"&gt;0.01&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;&lt;span class="k"&gt;True&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Loading a dataset
&lt;/h2&gt;

&lt;p&gt;Now, Memgraph and Node2Vec online are ready, and you can download and import the &lt;a href="https://download.memgraph.com/datasets/physics-citation-network/physics-citation-network.cypherl.gz" rel="noopener noreferrer"&gt;prepared dataset&lt;/a&gt; through your terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="n"&gt;mgconsole&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;use&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;ssl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;false&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;graph.cypherl&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you execute this command, the trigger will fire upon every new edge that appears in the graph. This will take around 1 minute to calculate embeddings. Now you can add new edges and &lt;code&gt;node2vec_online&lt;/code&gt; query module will update the embeddings and it will be ready for usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommendation system
&lt;/h2&gt;

&lt;p&gt;Here you will build a recommendation system and show MAGE how it is done.&lt;/p&gt;

&lt;p&gt;Before you continue, if you call this command, you should see that embeddings are ready. Again, execute the query in &lt;code&gt;Memgraph Lab&lt;/code&gt; or &lt;code&gt;mgconsole&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;node2vec_online.get&lt;/span&gt;&lt;span class="ss"&gt;()&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="nf"&gt;count&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;embeddings_count&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To create a recommendation engine, GitHub is offering you help. He heard that you want to impress MAGE and has some code already prepared &lt;a href="https://github.com/memgraph/physics-papers-recommender" rel="noopener noreferrer"&gt;here&lt;/a&gt;. You can trust him since he is also part of Memgraph. Download a code and be ready to spin it up.&lt;/p&gt;

&lt;p&gt;The recommendation engine will base on &lt;a href="https://scikit-learn.org/stable/modules/clustering.html#k-means" rel="noopener noreferrer"&gt;k-means from scikit-learn pacakge&lt;/a&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The k-means algorithm clusters data by trying to separate samples in &lt;code&gt;n&lt;/code&gt; groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified. It scales well to a large number of samples and has been used across a large range of application areas in many different fields.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In our procedure, we will first get embeddings from the &lt;code&gt;node2vec_online&lt;/code&gt; module. After that, using the &lt;a href="https://www.scikit-yb.org/en/latest/api/cluster/elbow.html" rel="noopener noreferrer"&gt;elbow method&lt;/a&gt; for k-means inertia will give you the best &lt;code&gt;k&lt;/code&gt; value, which represents the number of clusters. GitHub, which offered you help, recommends you to use 5 clusters, but you can try out different numbers of clusters.&lt;/p&gt;

&lt;p&gt;You can visualize k-means inertia using this command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 recommender.py visualize
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we are using k-means since we want to find which embeddings are close to each other in vector space, which would give us papers (nodes) that are similar in physics.&lt;/p&gt;

&lt;p&gt;After finding groups of similar papers, we are ready to get papers that are most similar by running the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 recommender.py similarities &lt;span class="nt"&gt;--n_clusters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5 &lt;span class="nt"&gt;--top_n_sim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After we get the results, there is one example of 99.7% similarity.&lt;/p&gt;

&lt;p&gt;This is exaggerated a lot on one side since we can't be sure just from the graph that these two papers are this similar, but from the graph structure presented later, it is expected that these two nodes have high similarity. From the description, we can see that papers are similar as they talk about similar topics. 99.7% is still a lot, but the algorithm works well! MAGE is impressed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

                   id: 9606040                                                                    id: 9610195                                                                       STATS                                     
       title: Mirror Symmetry is T-Duality                              title: Unification of M- and F- Theory Calabi-Yau Fourfold Vacua       
                                                                                                                                                                              similarity:0.9973                               


It is argued that every Calabi-Yau manifold  X  wi                             We consider splitting type phase transitions betwe              
th a mirror  Y  admits a family of supersymmetric                              en Calabi-Yau fourfolds. These transitions general              
toroidal 3-cycles. Moreover the moduli space of su                             ize previously known types of conifold transitions              
ch cycles together with their flat connections is                               between threefolds. Similar to conifold configura              
precisely the space  Y . The mirror transformation                             tions the singular varieties mediating the transit              
 is equivalent to T-duality on the 3-cycles. The g                             ions between fourfolds connect moduli spaces of di              
eometry of moduli space is addressed in a general                              fferent dimensions, describing ground states in M-              
    framework. Several examples are discussed.                                  and F-theory with different numbers of massless m              
                                                                               odes as well as different numbers of cycles to wra              
                                                                               p various p-branes around. The web of Calabi-Yau f              
                                                                               ourfolds obtained in this way contains the class o              
                                                                               f all complete intersection manifolds embedded in               
                                                                               products of ordinary projective spaces, but extend              
                                                                               s also to weighted configurations. It follows from              
                                                                                this that for some of the fourfold transitions va              
                                                                               cua with vanishing superpotential are connected to              
                                                                                   ground states with nonzero superpotential.                  
--------------------------------------------------                             --------------------------------------------------                             --------------------------------------------------
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Using Memgraph Lab, we can now visualize some of the data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;a:&lt;/span&gt;&lt;span class="n"&gt;Paper&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;bfs..10&lt;/span&gt; &lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="s1"&gt;'Paper'&lt;/span&gt; &lt;span class="ow"&gt;IN&lt;/span&gt; &lt;span class="nf"&gt;labels&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="ss"&gt;))]&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;b:&lt;/span&gt;&lt;span class="n"&gt;Paper&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a.id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"9610195"&lt;/span&gt; &lt;span class="ow"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;b.id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"9606040"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; 
    &lt;span class="n"&gt;OR&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a.id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"9606040"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;b.id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"9610195"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;connection&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fonline-node2vec-recommendation-system%2Fmemgraph-tutorial-lab-online-node2vec-recommendation-system.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fonline-node2vec-recommendation-system%2Fmemgraph-tutorial-lab-online-node2vec-recommendation-system.png" alt="memgraph-tutorial-lab-online-node2vec-recommendation-system"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Finding similar papers just from the graph structure is a complicated task, but the results are very promising.  You can take similar steps in different domains. The main advantage of this algorithm is that it works completely online, which is a great functionality in today's world when more and more data is event-driven.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/docs/mage" rel="noopener noreferrer"&gt;MAGE&lt;/a&gt; is a versatile open-source library containing standard graph algorithms that can help you analyze graph networks. While many graph libraries out there are great for performing graph computations, using MAGE and Memgraph provides you with additional benefits like persistent data storage and many other graph analytics capabilities.&lt;/p&gt;

&lt;p&gt;If you found this tutorial beneficial, you should try out the other algorithms included in the MAGE library.&lt;/p&gt;

&lt;p&gt;Finally, if you are working on your query module and would like to share it with other developers, take a look at the &lt;a href="https://github.com/memgraph/mage/blob/main/CONTRIBUTING.md" rel="noopener noreferrer"&gt;contributing guidelines&lt;/a&gt;. We would be more than happy to provide feedback and add the module to the &lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;MAGE repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References:
&lt;/h2&gt;

&lt;p&gt;[1] F. Béres, D. M. Kelen, R. Pálovics and A. A. Benczúr. Node embeddings in dynamic graphs. 2019. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog?topics=Graph+Algorithms&amp;amp;utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost&amp;amp;utm_content=banner#list" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0azgpsgm3wp9w5sd5wu.png" alt="Read more about Graph Algorithms on memgraph.com"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Temporal Graph Neural Networks With Pytorch - How to Create a Simple Recommendation Engine on an Amazon Dataset</title>
      <dc:creator>Antonio Filipovic</dc:creator>
      <pubDate>Fri, 20 Jan 2023 13:37:10 +0000</pubDate>
      <link>https://dev.to/memgraph/temporal-graph-neural-networks-with-pytorch-how-to-create-a-simple-recommendation-engine-on-an-amazon-dataset-5g42</link>
      <guid>https://dev.to/memgraph/temporal-graph-neural-networks-with-pytorch-how-to-create-a-simple-recommendation-engine-on-an-amazon-dataset-5g42</guid>
      <description>&lt;h2&gt;
  
  
  PYTORCH x MEMGRAPH x GNN  = 💟
&lt;/h2&gt;

&lt;p&gt;Over the course of the last few months, we at &lt;strong&gt;Memgraph&lt;/strong&gt; have been working on something that we believe could be helpful with classical graph prediction tasks. With our latest newborn query module, you will have the option of performing both &lt;strong&gt;label classification&lt;/strong&gt; and &lt;strong&gt;link prediction&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But, how come a query module can do both label classification and link prediction? It's all thanks to &lt;strong&gt;graph neural networks&lt;/strong&gt;, for short &lt;strong&gt;GNNs&lt;/strong&gt;. ❤️ &lt;/p&gt;

&lt;h2&gt;
  
  
  Graph neural networks
&lt;/h2&gt;

&lt;p&gt;Whether you are a software engineer or a deep learning enthusiast, there is a high chance you heard of &lt;strong&gt;&lt;a href="https://en.wikipedia.org/wiki/Graph_neural_network" rel="noopener noreferrer"&gt;graph neural networks&lt;/a&gt;&lt;/strong&gt; as a rising ⭐. Maybe you even deep-dived into this topic and are now ready for a &lt;strong&gt;&lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;new MAGE spell&lt;/a&gt;&lt;/strong&gt;. But even if you haven't, don't worry, I will try to give you a quick overview so you can catch up and follow along.&lt;/p&gt;

&lt;p&gt;You probably already know that a graph consists of &lt;strong&gt;nodes (vertices)&lt;/strong&gt; and &lt;strong&gt;edges (relationships)&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxg80q9wn8qqyjrbtxp8a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxg80q9wn8qqyjrbtxp8a.png" alt="amazon-user-item-recommender-with-tgn-and-memgraph" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every node can have its &lt;strong&gt;feature vector&lt;/strong&gt;, which essentially describes that node with a vector of numbers. We can look at this &lt;strong&gt;feature vector&lt;/strong&gt; as the &lt;strong&gt;representation vector&lt;/strong&gt; of each node, also called &lt;strong&gt;embedding&lt;/strong&gt; of the node. &lt;br&gt;
To avoid getting lost in technical details, &lt;strong&gt;graph neural networks&lt;/strong&gt; work as a &lt;strong&gt;message passing&lt;sup&gt;[2]&lt;/sup&gt;&lt;/strong&gt; system, where each node aggregates feature representations of its 1-hop neighbors. To be more precise, nodes don’t aggregate feature representations directly, but feature vectors obtained by dimensionality reduction using the &lt;em&gt;W&lt;/em&gt; matrix (you can look at them as fully connected linear layers). This matrix projected feature vectors are called &lt;strong&gt;messages&lt;/strong&gt; and they give expressive power to graph neural networks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04u6l3ym72i653udjxqt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F04u6l3ym72i653udjxqt.png" alt="amazon-user-item-recommender-with-tgn-and-memgraph" width="800" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This idea originates from the field of &lt;strong&gt;graph signal processing&lt;/strong&gt;. Now, we don't have time here to explain all about how we got from &lt;strong&gt;signals&lt;/strong&gt; to &lt;strong&gt;message passing&lt;/strong&gt;, but it's all math.&lt;br&gt;
Feel free to drop us a message on &lt;a href="https://discord.gg/memgraph" rel="noopener noreferrer"&gt;Discord&lt;/a&gt; and we will make sure to create a blog post explaining the &lt;strong&gt;graph neural network&lt;/strong&gt; introduction topic, not a simplified version, but one explaining all of it from the beginning, somewhere about &lt;a href="https://arxiv.org/pdf/0711.0189.pdf" rel="noopener noreferrer"&gt;Tutorial on Spectral Clustering&lt;/a&gt; in 2007.&lt;/p&gt;

&lt;p&gt;If you would like to get a better understanding of &lt;strong&gt;graph neural networks&lt;/strong&gt; before continuing, I suggest you check out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;this &lt;strong&gt;&lt;a href="https://distill.pub/2021/gnn-intro/" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;&lt;/strong&gt; provides a gentle introduction to the topic&lt;/li&gt;
&lt;li&gt;you can also check &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=6g9vtxUmfwM&amp;amp;list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&amp;amp;index=14&amp;amp;ab_channel=StanfordOnline" rel="noopener noreferrer"&gt;the video explanation&lt;/a&gt;&lt;/strong&gt; by the Stanford professor Jure Leskovec about graph neural networks - I would honestly suggest to binge-watch the whole series, but if you don't have that much time, just watch the lectures called &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=6g9vtxUmfwM&amp;amp;list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&amp;amp;index=15&amp;amp;ab_channel=StanfordOnline" rel="noopener noreferrer"&gt;Message passing and Node Classification&lt;/a&gt;&lt;/strong&gt; and &lt;strong&gt;&lt;a href="https://www.youtube.com/watch?v=F3PgltDzllc&amp;amp;list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&amp;amp;index=17&amp;amp;ab_channel=StanfordOnline" rel="noopener noreferrer"&gt;Introduction to Graph Neural Networks&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;and if you want to deep-dive, which I suggest, I will leave &lt;a href="https://gordicaleksa.medium.com/how-to-get-started-with-graph-machine-learning-afa53f6f963a" rel="noopener noreferrer"&gt;the following blog post&lt;/a&gt;, it will be more than enough.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The reason why we added GNNs to &lt;strong&gt;&lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;MAGE&lt;/a&gt;&lt;/strong&gt; is that &lt;strong&gt;GNNs&lt;/strong&gt; are to &lt;strong&gt;graphs&lt;/strong&gt; what &lt;strong&gt;&lt;a href="https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53" rel="noopener noreferrer"&gt;CNNs&lt;/a&gt;&lt;/strong&gt; are to &lt;strong&gt;images&lt;/strong&gt;. &lt;strong&gt;GNNs&lt;/strong&gt; can inductively learn about your dataset, which means that after training is complete you can apply their knowledge to a similar use case, which is very cool since you don't need to retrain the whole algorithm. With other representation learning methods like &lt;strong&gt;&lt;a href="https://arxiv.org/abs/1403.6652" rel="noopener noreferrer"&gt;DeepWalk&lt;/a&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;a href="https://arxiv.org/abs/1607.00653" rel="noopener noreferrer"&gt;Node2Vec&lt;/a&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;a href="https://arxiv.org/abs/1603.08861" rel="noopener noreferrer"&gt;Planetoid&lt;/a&gt;&lt;/strong&gt;, we haven't been able to do that until, well, &lt;strong&gt;GNNs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now, why temporal graph neural networks?&lt;/strong&gt;&lt;br&gt;
Imagine you are in charge of a product where users interact with items every minute of every day, and they like some and hate the others. You would like to present them with more items they like, and not just that, you would love it if they bought those new items. This way you have a &lt;strong&gt;stream&lt;/strong&gt; of data. Interactions appear across time, so you are dealing with a &lt;strong&gt;temporal&lt;/strong&gt; dataset. The classical &lt;strong&gt;GNNs&lt;/strong&gt; are not designed to work with streams, although they work very well on unseen data. But it is not all nails if you have a hammer - some methods work better on streams, others on static data. &lt;/p&gt;
&lt;h2&gt;
  
  
  Temporal graph networks
&lt;/h2&gt;

&lt;p&gt;As you already know, we in &lt;a href="https://memgraph.com?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;Memgraph&lt;/a&gt; are all about streams.&lt;/p&gt;

&lt;p&gt;Thanks to the guys at &lt;a href="https://twitter.com/twitterresearch" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt;, they developed a &lt;strong&gt;GNN&lt;/strong&gt; that works on &lt;strong&gt;temporal&lt;/strong&gt; graph networks. This way &lt;strong&gt;GNNs&lt;/strong&gt; can deal with &lt;strong&gt;continuous-time dynamic graphs&lt;/strong&gt;. In the image below you can see a schematic view of &lt;strong&gt;temporal graph networks&lt;/strong&gt;. It is a lot to take in, but the process, once explained, is not that complicated.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://public-assets.memgraph.com/amazon-user-item-recommender-with-tgn-and-memgraph/memgraph-temporal-graph-networks.jpg" rel="noopener noreferrer"&gt;&lt;br&gt;
&lt;img alt="TGN" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Famazon-user-item-recommender-with-tgn-and-memgraph%2Fmemgraph-temporal-graph-networks.jpg" width="800" height="525"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Firstly, in &lt;strong&gt;continuous-time dynamic graphs&lt;/strong&gt;, you can model changes on graphs that include &lt;strong&gt;edge or node addition&lt;/strong&gt;, &lt;strong&gt;edge or node feature transformation (update)&lt;/strong&gt;, &lt;strong&gt;edge or node deletion&lt;/strong&gt; as time-listed events.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://arxiv.org/pdf/2006.10637.pdf" rel="noopener noreferrer"&gt;Temporal graph networks&lt;sup&gt;[1]&lt;/sup&gt;&lt;/a&gt;&lt;/strong&gt;, shortened &lt;strong&gt;TGNs&lt;/strong&gt;, work as follows: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;node embedding calculations work on the concept of message passing, which I hope you are familiar with at this point&lt;/li&gt;
&lt;li&gt;TGNs use &lt;strong&gt;events&lt;/strong&gt;, and whenever a new edge appears, it represents an &lt;strong&gt;interaction event&lt;/strong&gt; between two nodes involved&lt;/li&gt;
&lt;li&gt;from every &lt;strong&gt;event&lt;/strong&gt;, we create a &lt;strong&gt;message&lt;/strong&gt; and use a &lt;strong&gt;message aggregator&lt;/strong&gt; for all messages of the same node to get the &lt;strong&gt;aggregated message&lt;/strong&gt; of every node&lt;/li&gt;
&lt;li&gt;every node has its own &lt;strong&gt;memory&lt;/strong&gt; which represents an &lt;strong&gt;accumulated&lt;/strong&gt; state, updated with an &lt;strong&gt;aggregated message&lt;/strong&gt; by one of the &lt;strong&gt;&lt;a href="https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21" rel="noopener noreferrer"&gt;LSTM or GRU&lt;/a&gt;&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Lastly, the embedding module is used to generate the temporal embedding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are two embedding module types we integrated into our &lt;strong&gt;TGN&lt;/strong&gt; implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Graph attention layer&lt;/strong&gt;: it is a similar concept as in &lt;strong&gt;&lt;a href="https://arxiv.org/abs/1710.10903" rel="noopener noreferrer"&gt;Graph attention networks&lt;/a&gt;&lt;/strong&gt;, but here they use the original idea from Vaswani et al. &lt;strong&gt;&lt;a href="https://arxiv.org/abs/1706.03762" rel="noopener noreferrer"&gt;Attention is all you need&lt;/a&gt;&lt;/strong&gt; which includes queries, keys and values and everything else is the same. I suggest you look at the &lt;strong&gt;TGN&lt;/strong&gt; paper to check the exact embedding calculation details.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Graph sum layer&lt;/strong&gt;: this mechanism is completely similar to the &lt;strong&gt;message passing&lt;/strong&gt; system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is a certain problem when dealing with &lt;strong&gt;embedding updates&lt;/strong&gt;. We don't update embeddings for every node, only for ones that appear in a batch. Also, in order not to get so much into implementation details, we will try to abstractly explain the following problem. Nodes in the batch appear at different points in time. That's why we need to take into account when was their update so that we can only use neighbors which appeared in the graph before them. This is in case we update the whole representation of the graph with batch information, and then do the calculation, which is what we did. This leads to having a different computation graph for every node. You can see what it looks like in the image below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5k3zc678kex7et0op0d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5k3zc678kex7et0op0d.png" alt="memgraph-temporal-computation-graph" width="800" height="539"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Amazon data example
&lt;/h2&gt;

&lt;p&gt;To try out how this works, we have prepared a Jupyter Notebook on our &lt;strong&gt;&lt;a href="https://github.com/memgraph/jupyter-memgraph-tutorials" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;&lt;/strong&gt;. It is about &lt;a href="http://snap.stanford.edu/data/amazon/productGraph/" rel="noopener noreferrer"&gt;Amazon user-item reviews&lt;/a&gt;. In the following example, you will see how to do &lt;strong&gt;link prediction&lt;/strong&gt; with &lt;strong&gt;TGN&lt;/strong&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Exploring an Amazon data network in Memgraph
&lt;/h2&gt;

&lt;p&gt;Through this short tutorial, you will learn how to install Memgraph, connect to it from a Jupyter Notebook and perform data analysis on an Amazon dataset using a &lt;strong&gt;graph neural network&lt;/strong&gt; called &lt;strong&gt;Temporal graph networks&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Prerequisites
&lt;/h3&gt;

&lt;p&gt;For this tutorial, you will need to install:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://jupyter.org/install" rel="noopener noreferrer"&gt;Jupyter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.docker.com/get-docker/" rel="noopener noreferrer"&gt;Docker&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/gqlalchemy/" rel="noopener noreferrer"&gt;GQLAlchemy&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Docker is used because Memgraph is a native Linux application and cannot be installed on Windows and macOS.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Installation using Docker
&lt;/h3&gt;

&lt;p&gt;After installing Docker, you can set up Memgraph by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -it -p 7687:7687 -p 3000:3000 -p 7444:7444 memgraph/memgraph-platform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will start the download and after it finishes, run the Memgraph container.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Connecting to Memgraph with GQLAlchemy
&lt;/h3&gt;

&lt;p&gt;We will be using the &lt;strong&gt;GQLAlchemy&lt;/strong&gt; object graph mapper (OGM) to connect to Memgraph and execute &lt;strong&gt;Cypher&lt;/strong&gt; queries easily. GQLAlchemy also serves as a Python driver/client for Memgraph. You can install it using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install gqlalchemy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Hint&lt;/strong&gt;: You may need to install &lt;a href="https://cmake.org/download/" rel="noopener noreferrer"&gt;CMake&lt;/a&gt; before installing GQLAlchemy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Maybe you got confused when I mentioned Cypher. You can think of Cypher as SQL for graph databases. It contains many of the same language constructs like &lt;code&gt;CREATE&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt;... and it's used to query the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Memgraph&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memgraph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Memgraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7687&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's make sure that Memgraph is empty before we start with anything else.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop_database&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Following command should output {number_of_nodes:0}&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    MATCH (n) RETURN count(n) AS number_of_nodes ;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Data analysis on an Amazon product dataset
&lt;/h3&gt;

&lt;p&gt;You will load an &lt;strong&gt;amazon product dataset&lt;/strong&gt; as a list of Cypher queries. This is what it looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://public-assets.memgraph.com/amazon-user-item-recommender-with-tgn-and-memgraph/memgraph-amazon-product-dataset.png" rel="noopener noreferrer"&gt;&lt;br&gt;
&lt;img alt="dataset" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Famazon-user-item-recommender-with-tgn-and-memgraph%2Fmemgraph-amazon-product-dataset.png" width="800" height="612"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An example of the aforementioned queries is the following one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MERGE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;a:&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;id:&lt;/span&gt; &lt;span class="s1"&gt;'A1BHUGKLYW6H7V'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="py"&gt;profile_name:&lt;/span&gt;&lt;span class="s1"&gt;'P. Lecuyer'&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;MERGE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;b:&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;id:&lt;/span&gt; &lt;span class="s1"&gt;'B0007MCVQ2'&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;MERGE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;:REVIEWED&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;review_text:&lt;/span&gt;&lt;span class="s1"&gt;'Like all Clarks, these guys didnt disappoint. They fit great and look even better. For the price, I dont think a better deal exists out there for casual shoes.'&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;
  &lt;span class="py"&gt;feature:&lt;/span&gt; &lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;161.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;133.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.782608695652174&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.031055900621118012&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.17391304347826086&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.043478260869565216&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;36.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;36.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.6944444444444446&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;12.0&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.055&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.519&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.427&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.9238&lt;/span&gt;&lt;span class="ss"&gt;],&lt;/span&gt;
  &lt;span class="nl"&gt;review_time&lt;/span&gt;&lt;span class="dl"&gt;:&lt;/span&gt;&lt;span class="m"&gt;1127088000&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="nl"&gt;review_score&lt;/span&gt;&lt;span class="dl"&gt;:&lt;/span&gt;&lt;span class="m"&gt;5.0&lt;/span&gt;&lt;span class="ss"&gt;}]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="ss"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So as you can see, we have &lt;code&gt;User&lt;/code&gt; nodes and &lt;code&gt;Item&lt;/code&gt; nodes in our graph schema. Every user has left a very &lt;strong&gt;positive&lt;/strong&gt; review for an &lt;strong&gt;Item&lt;/strong&gt;. This wasn't the case for all the reviews in our original dataset, but we processed it and removed negative reviews (all reviews with &lt;code&gt;review_score&lt;/code&gt; &amp;lt;= 3.0).&lt;br&gt;
Every &lt;code&gt;User&lt;/code&gt; has an &lt;code&gt;id&lt;/code&gt; and every &lt;code&gt;Item&lt;/code&gt; that has been reviewed has an &lt;code&gt;id&lt;/code&gt; as well. In this one query, we find the &lt;code&gt;User&lt;/code&gt; and the &lt;code&gt;Item&lt;/code&gt; with mentioned ids or we create one if such &lt;code&gt;User&lt;/code&gt; or &lt;code&gt;Item&lt;/code&gt; is missing from the database. We create an interaction event between them in terms of an &lt;code&gt;edge&lt;/code&gt; which has a list of &lt;strong&gt;20&lt;/strong&gt; edge features. This &lt;code&gt;edge_features&lt;/code&gt; we created from user reviews:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Number of characters
2. Number of characters without counting white space
3. Fraction of alphabetical characters
4. Fraction of digits
5. Fraction of uppercase characters
6. Fraction of white spaces
7. Fraction of special characters, such as comma, exclamation mark, etc.
8. Number of words
9. Number of unique works
10. Number of long words (at least 6 characters)
11. Average word length
12. Number of unique stopwords
13. Fraction of stopwords
14. Number of sentences
15. Number of long sentences (at least 10 words)
16. Average number of words per sentence
17. Positive sentiment calculated by VADER 
# VADER - Valence Aware Dictionary and sEntiment Reasoner lexicon 
# and rule-based sentiment analysis tool 
18. Negative sentiment calculated by VADER
19. Neutral sentiment calculated by VADER
20. Compound sentiment calculated by VADER
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We should have also prepared features for a &lt;code&gt;User&lt;/code&gt; and &lt;code&gt;Item&lt;/code&gt;, but these features seemed enough for our example.&lt;/p&gt;

&lt;p&gt;One more &lt;strong&gt;note&lt;/strong&gt;: In this dataset of queries we already prepared for you, there is one query that will change the "working mode" of our &lt;strong&gt;temporal graph networks&lt;/strong&gt; module to &lt;strong&gt;evaluation(eval)&lt;/strong&gt; mode. When the mode of the &lt;strong&gt;tgn&lt;/strong&gt; is changed it also stops doing &lt;strong&gt;training&lt;/strong&gt; of the model and starts doing &lt;strong&gt;evaluation&lt;/strong&gt; of the trained model.&lt;br&gt;
If you look inside the file, you should find the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;tgn.set_mode&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"eval"&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Trigger creation
&lt;/h4&gt;

&lt;p&gt;In order to process a dataset, we need to create a trigger on the &lt;strong&gt;edge create&lt;/strong&gt; event if a trigger with that name doesn't exist. &lt;/p&gt;

&lt;p&gt;This check is a neat feature to have in your Jupyter notebook if you want just to rerun it without dumping the &lt;strong&gt;local Memgraph&lt;/strong&gt; instance if you are not working with &lt;strong&gt;Docker&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SHOW TRIGGERS;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;trigger_exists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;trigger name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;create_embeddings&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Trigger already exists&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;trigger_exists&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;trigger_exists&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        CREATE TRIGGER create_embeddings ON --&amp;gt; CREATE BEFORE COMMIT
        EXECUTE CALL tgn.update(createdEdges) RETURN 1;
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Index creation for dataset
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Memgraph&lt;/strong&gt; works best with &lt;strong&gt;indexes&lt;/strong&gt; defined for nodes. In our case, we will create indexes for &lt;strong&gt;User&lt;/strong&gt; and &lt;strong&gt;Item&lt;/strong&gt; nodes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;index_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE INDEX ON :User(id);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CREATE INDEX ON :Item(id);&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;index_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Training and evaluating Temporal Graph Networks
&lt;/h4&gt;

&lt;p&gt;In order to train a &lt;strong&gt;Temporal graph network&lt;/strong&gt; on an &lt;strong&gt;Amazon dataset&lt;/strong&gt;, we will split the dataset into &lt;strong&gt;train&lt;/strong&gt; and &lt;strong&gt;eval&lt;/strong&gt; queries. Let's first load our &lt;strong&gt;raw queries&lt;/strong&gt;. Each query creates an &lt;strong&gt;edge&lt;/strong&gt; between &lt;strong&gt;User&lt;/strong&gt; and &lt;strong&gt;Item&lt;/strong&gt; thus representing a &lt;strong&gt;positive&lt;/strong&gt; review of a certain &lt;strong&gt;Item&lt;/strong&gt; by a &lt;strong&gt;User&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;dir_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getcwd&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dir_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/data/queries.cypherl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;raw_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;train_eval_split_ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;
&lt;span class="n"&gt;queries_index_split&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_queries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;train_eval_split_ratio&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;train_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_queries&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;queries_index_split&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;eval_queries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_queries&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;queries_index_split&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Num of train queries &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;train_queries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Num of eval queries &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;eval_queries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before we start importing &lt;strong&gt;train&lt;/strong&gt; queries, first we need to set parameters for &lt;strong&gt;temporal graph networks&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# since we are doing link prediction, we use self_supervised mode
&lt;/span&gt;&lt;span class="n"&gt;learning_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;self_supervised&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="c1"&gt;#optimal size as defined in paper
&lt;/span&gt;&lt;span class="n"&gt;num_of_layers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="c1"&gt;# GNNs don't need multiple layers, contrary to CNNs.
&lt;/span&gt;&lt;span class="n"&gt;layer_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;graph_attn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# choose between graph_attn or graph_sum
&lt;/span&gt;&lt;span class="n"&gt;edge_message_function_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;identity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# choose between identity or mlp
&lt;/span&gt;&lt;span class="n"&gt;message_aggregator_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# choose between last or mean
&lt;/span&gt;&lt;span class="n"&gt;memory_updater_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gru&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# choose between gru or rnn
&lt;/span&gt;&lt;span class="n"&gt;attention_heads&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;memory_dimension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;time_dimension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;num_edge_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;
&lt;span class="n"&gt;num_node_features&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="c1"&gt;# number of sampled neighbors
&lt;/span&gt;&lt;span class="n"&gt;num_neighbors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;
&lt;span class="c1"&gt;# message dimension must be defined in the case we use MLP, 
# because then we define dimension of **projection**
&lt;/span&gt;&lt;span class="n"&gt;message_dimension&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time_dimension&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;num_node_features&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;num_edge_features&lt;/span&gt;

&lt;span class="n"&gt;tgn_param_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CALL tgn.set_params({{learning_type:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;learning_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, 
batch_size: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, num_of_layers:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_of_layers&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, 
layer_type:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;layer_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, memory_dimension:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_dimension&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, 
time_dimension:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;time_dimension&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, num_edge_features:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_edge_features&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, 
num_node_features:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_node_features&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, message_dimension:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message_dimension&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;,
num_neighbors:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_neighbors&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, 
edge_message_function_type:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;edge_message_function_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,
message_aggregator_type:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message_aggregator_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,
memory_updater_type:&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory_updater_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, 
attention_heads:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;attention_heads&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;}}) 
YIELD *;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TGN param query: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tgn_param_query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tgn_param_query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now it is time to execute queries and perform the &lt;strong&gt;first&lt;/strong&gt; epoch of &lt;strong&gt;training&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;train_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we need to change &lt;strong&gt;TGN&lt;/strong&gt; mode to &lt;strong&gt;eval&lt;/strong&gt; and start importing our &lt;strong&gt;evaluation&lt;/strong&gt; queries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CALL tgn.set_eval() YIELD *;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;eval_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After our &lt;strong&gt;stream&lt;/strong&gt; is done, we should probably do a few more rounds of training and evaluation in order to have a properly working model. We can do so with the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;num_of_epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    CALL tgn.train_and_eval(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;num_of_epochs&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;) YIELD *
    RETURN epoch_num, batch_num, precision, batch_process_time, batch_type 
    ORDER BY epoch_num, batch_num;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, let's get the results and then do some &lt;strong&gt;plotting&lt;/strong&gt; to check whether the &lt;strong&gt;precision&lt;/strong&gt; increases between epochs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;

&lt;span class="n"&gt;results_train_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;results_eval_dict&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        CALL tgn.get_results() 
        YIELD  epoch_num, batch_num, precision, batch_process_time, batch_type
        RETURN epoch_num, batch_num, precision, batch_process_time, batch_type 
        ORDER BY epoch_num, batch_num;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;batch_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Train&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results_train_dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch_num&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;precision&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results_eval_dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch_num&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;precision&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now that we have collected the results, let's first plot the average &lt;code&gt;accuracy&lt;/code&gt; of &lt;code&gt;train&lt;/code&gt; batches inside epoch, and the average &lt;code&gt;accuracy&lt;/code&gt; of &lt;code&gt;eval&lt;/code&gt; batches inside epoch. We can do that since every batch is the same size. (&lt;strong&gt;NB&lt;/strong&gt;: &lt;code&gt;TGN&lt;/code&gt; uses a predefined batch size.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;X_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;Y_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batches_precision&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results_train_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;Y_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batches_precision&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;X_eval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;Y_eval&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batches_precision&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results_eval_dict&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;Y_eval&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batches_precision&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;X_eval&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;epoch&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="c1"&gt;#scatter plot
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Y_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;train&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;plot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;X_eval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Y_eval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#add title
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch - average batch precision&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#add x and y labels
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;xlabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;epoch&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ylabel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;precision&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;upper left&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#show plot
&lt;/span&gt;&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxnhb6hrvnznpeir3kyym.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxnhb6hrvnznpeir3kyym.png" alt="memgraph-average-batch-precision" width="392" height="278"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that &lt;strong&gt;average accuracy&lt;/strong&gt; increases, which is really good. Now we can start creating some recommendations. Let's find &lt;code&gt;Users&lt;/code&gt; who reviewed one &lt;code&gt;Item&lt;/code&gt; positively and those who reviewed multiple &lt;code&gt;Items&lt;/code&gt; positively. Our module will return what it believes should be a prediction score for yet unreviewed &lt;code&gt;Items&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        MATCH (n:User)
        WITH n
        LIMIT 15
        MATCH (m:Item)
        OPTIONAL MATCH (n)-[r]-&amp;gt;(m)
        WHERE r is null
        CALL tgn.predict_link_score(n,m) YIELD prediction
        WITH n,m, prediction
        ORDER BY prediction DESC
        LIMIT 10
        MERGE (n)-[:PREDICTED_REVIEW {likelihood:prediction}]-&amp;gt;(m);

    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can run the following query in Memgraph Lab:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;u:&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="py"&gt;pr:&lt;/span&gt;&lt;span class="n"&gt;PREDICTED_REVIEW&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;i:&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="ss"&gt;),&lt;/span&gt; &lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="py"&gt;r:&lt;/span&gt;&lt;span class="n"&gt;REVIEWED&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;oi:&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And after &lt;a href="https://github.com/memgraph/jupyter-memgraph-tutorials/tree/main/pytorch_amazon_network_analysis" rel="noopener noreferrer"&gt;applying a style&lt;/a&gt;, we get the following visualization. From the image below, we can see that most predictions are oriented towards one of the most popular items. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://public-assets.memgraph.com/amazon-user-item-recommender-with-tgn-and-memgraph/memgraph-amazon-product-link-prediction.png" rel="noopener noreferrer"&gt;&lt;br&gt;
&lt;img alt="dataset" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Famazon-user-item-recommender-with-tgn-and-memgraph%2Fmemgraph-amazon-product-link-prediction.png" width="800" height="391"&gt;&lt;br&gt;
&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Where to next?
&lt;/h2&gt;

&lt;p&gt;Well, I hope this was fun and that you have learned something. You can check out everything else that we implemented in the &lt;strong&gt;&lt;a href="https://memgraph.com/blog/mage-1-2-release" rel="noopener noreferrer"&gt;MAGE 1.2 release&lt;/a&gt;&lt;/strong&gt;. If you loved our implementation, don't hesitate to give us a star on &lt;strong&gt;&lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/strong&gt; ⭐. If you have any comments or suggestions, you can contact us on &lt;strong&gt;&lt;a href="https://discord.gg/memgraph" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/strong&gt;. And lastly, if you wish to continue reading posts about graph analytics, check out &lt;strong&gt;&lt;a href="https://memgraph.com/blog" rel="noopener noreferrer"&gt;our blog&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1] E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, M. Bronstein (2020). Temporal Graph Networks for Deep Learning on Dynamic Graphs&lt;/p&gt;

&lt;p&gt;[2] W.L. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on large&lt;br&gt;
graphs. U Advances in Neural Information Processing Systems 30, 2017&lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog?topics=Graph+Algorithms&amp;amp;utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost&amp;amp;utm_content=banner#list" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0azgpsgm3wp9w5sd5wu.png" alt="Read more about graph algorithms on memgraph.com" width="800" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>softwaredevelopment</category>
      <category>development</category>
    </item>
    <item>
      <title>3 Powerful Queries to Find Patterns in Your Knowledge Graph You Haven’t Noticed Before</title>
      <dc:creator>Antonio Filipovic</dc:creator>
      <pubDate>Fri, 13 Jan 2023 14:07:42 +0000</pubDate>
      <link>https://dev.to/memgraph/3-powerful-queries-to-find-patterns-in-your-knowledge-graph-you-havent-noticed-before-4ba7</link>
      <guid>https://dev.to/memgraph/3-powerful-queries-to-find-patterns-in-your-knowledge-graph-you-havent-noticed-before-4ba7</guid>
      <description>&lt;p&gt;Today, there are not a lot of companies worry about the lack of data. Everything is logged and stored in different databases and technologies. The current issue is that companies can’t conclude anything from all that data, which is especially disastrous if the data indicates that it’s time for the company to change how it does business.&lt;/p&gt;

&lt;p&gt;Creating a large web of interconnected data as a graph is the crucial first step for companies to get a complete picture of their business and understand its direction. &lt;/p&gt;

&lt;p&gt;First, data is gathered in one place. Then a graph is created with a semantics layer placed on top, containing information about the data, thus &lt;a href="https://memgraph.com/blog/inferring-knowledge-from-unused-siloed-stores-using-graphs?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;creating a knowledge graph&lt;/a&gt;. Analyzing the knowledge graph uncovers new information in the data. &lt;/p&gt;

&lt;p&gt;To create a knowledge graph, you must be careful about which toolset you choose. If you need to use several different solutions, it is impossible to gather data entirely and thus impossible to analyze it. The ultimate result is slow decision-making. An excellent tool are graph databases, which are &lt;a href="https://memgraph.com/blog/4-reasons-why-graph-tech-is-great-for-knowledge-graphs?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;designed to explore relationships and hop through data&lt;/a&gt;. Using tools that cannot explore relationships and hopping requires a lot of coding to make just a few initial steps in the analysis.&lt;/p&gt;

&lt;p&gt;As a graph database, Memgraph fits perfectly into the knowledge graph use case. It also offers &lt;a href="https://memgraph.com/mage/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;free and open-source graph analytics algorithms&lt;/a&gt;. As each business needs to deal with unique problems, one algorithm cannot solve all of them equally well. That’s why the library tries to offer an extensive range of analytics algorithms. The best part is that your team doesn’t have to worry about writing a single line of code. They can focus on creating knowledge graphs, understanding data and making sense of it.&lt;/p&gt;

&lt;p&gt;Graph analytics is just one of many big pluses. Memgraph is also an in-memory graph database, which means you won’t have to wait a whole day for graph analytics algorithms to spit out your results or get obsolete results. But having an in-memory database doesn’t mean losing your data is possible since backups and data persistence are obtained using disk storage.&lt;/p&gt;

&lt;p&gt;Modeling data using property graphs, as you would with Memgraph, makes sense since you can see all the nodes and how they relate to each other by examining their relationships. After dealing with relational databases for years, it’s hard even to imagine what complex questions graph databases can answer. Relational databases cannot provide such answers, or the execution takes so much time you don’t even try it. &lt;/p&gt;

&lt;p&gt;In the rest of this blog post, you will see what algorithms you can use to ask specific questions and query the graph database, which will help you uncover knowledge hidden within your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pattern-matching questions
&lt;/h2&gt;

&lt;p&gt;The easiest way to discover new knowledge in graphs is pattern matching. Pattern matching is a basic exploration of data in which the database searches for nodes with a specific label connected with a relationship of a specific type to another node. In other words, it searches for the shape of the data you defined and retrieves results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Content and connectedness
&lt;/h3&gt;

&lt;p&gt;In finance, companies often question whether an account is connected to another account known for fraudulent activity. If one account is connected to fraudulent activity, all the intermediary accounts might also be connected to it, like a neverending string of handkerchiefs. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx04d4zgo7oh78vyb1meu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx04d4zgo7oh78vyb1meu.png" alt="3-powerful-queries-to-find-patterns-in-your-knowledge-graph-you-havent-noticed-before" width="800" height="895"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With Memgraph and basic pattern matching, you can uncover such connections with the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;n:&lt;/span&gt;&lt;span class="n"&gt;Node1&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;m:&lt;/span&gt;&lt;span class="n"&gt;Node2&lt;/span&gt;&lt;span class="ss"&gt;),&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="ss"&gt;),&lt;/span&gt; &lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;f:&lt;/span&gt;&lt;span class="n"&gt;FraudulantActivity&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;p2&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="nf"&gt;nodes&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="nf"&gt;nodes&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query above looks for different paths &lt;code&gt;p1&lt;/code&gt; and &lt;code&gt;p2&lt;/code&gt;  between node named &lt;code&gt;n&lt;/code&gt; and node named &lt;code&gt;m&lt;/code&gt; and returns such nodes on those different paths. As we mentioned above, such nodes could be part of some fraudulent activity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Commonality
&lt;/h3&gt;

&lt;p&gt;Graph analytics make it easier to think about cause and effect. Knowledge graphs particularly enable using some a priori knowledge about business processes to infer new knowledge.&lt;/p&gt;

&lt;p&gt;In finance, when the same person controls certain companies, it can be a hint of illegal activities. It’s quite hard to find a common denominator using a relational database since it’s impossible to know how deep into company ownerships you need to dive into, and there might be different numbers of owners. &lt;/p&gt;

&lt;p&gt;Graph databases can search for a common ancestor or successor of a certain entity, like in the picture below, regardless of the depth or number of companies the database needs to search through:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96piukmdoj45tv7djv4n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F96piukmdoj45tv7djv4n.png" alt="3-powerful-queries-to-find-patterns-in-your-knowledge-graph-you-havent-noticed-before/memgraph-commonality" width="800" height="613"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A query written in Cypher would be as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;n1:&lt;/span&gt;&lt;span class="n"&gt;Node1&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;prop:&lt;/span&gt;&lt;span class="s2"&gt;"a"&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;graph_util.descendants&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n1&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;  &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;descendants&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;descendants_n1&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;descendants_n1&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;n2:&lt;/span&gt; &lt;span class="n"&gt;Node2&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;prop:&lt;/span&gt;&lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;graph_util.descendants&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n2&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="n"&gt;descendants&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;descendants_n2&lt;/span&gt;
&lt;span class="k"&gt;UNWIND&lt;/span&gt; &lt;span class="n"&gt;descendants_n1&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dn1&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="n"&gt;dn1&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dn1&lt;/span&gt; &lt;span class="ow"&gt;IN&lt;/span&gt; &lt;span class="n"&gt;descendants_n2&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;true&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;dn1&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The query above will first find all the descendants of node &lt;code&gt;n1&lt;/code&gt; and all the descendants of node &lt;code&gt;n2&lt;/code&gt;. If there is some descendant of node &lt;code&gt;n1&lt;/code&gt;, which is also in descendants of node &lt;code&gt;n2&lt;/code&gt;, then we have found a common descendant, which is exactly what commonality is looking for.&lt;/p&gt;

&lt;h3&gt;
  
  
  Alternative action
&lt;/h3&gt;

&lt;p&gt;Mistakes happen all the time. Once they do, the most important question is how to soften the blow. In finance, once a certain branch is deactivated, we need to find another way to get money transferred.&lt;/p&gt;

&lt;p&gt;To do so, we need to find content equivalence. Content equivalence finds a similar path between two nodes. It helps protect the business from future failures at certain points in the finance chain. And it does so by finding alternative paths between two nodes. It involves node hopping and pattern matching, two operations graph databases, especially Memgraph are optimized for. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fepdg3j43ve839hl61cuv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fepdg3j43ve839hl61cuv.png" alt="3-powerful-queries-to-find-patterns-in-your-knowledge-graph-you-havent-noticed-before/memgraph-alternative-action" width="800" height="613"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In Memgraph, you would ask such a question with the following query:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;n:&lt;/span&gt;&lt;span class="n"&gt;Node3&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;prop:&lt;/span&gt;&lt;span class="s2"&gt;"c"&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="ss"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;wShortest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt;&lt;span class="ss"&gt;]&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;t:&lt;/span&gt;&lt;span class="n"&gt;Target&lt;/span&gt;&lt;span class="ss"&gt;),&lt;/span&gt; &lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;n2:&lt;/span&gt;&lt;span class="n"&gt;Node2&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;prop:&lt;/span&gt;&lt;span class="s2"&gt;"b"&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;n2&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;nodes&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;sum&lt;/span&gt; &lt;span class="k"&gt;DESC&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following query finds shortest paths between starting node &lt;code&gt;n&lt;/code&gt; and end node &lt;code&gt;t&lt;/code&gt;. It returns paths starting with the shortest one; thus, you can see the most effective way to substitute your current solution in case something fails along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Pattern matching doesn’t sound like a flashy analysis tool but combined with graph analytics algorithms, it is a powerful tool that can analyze almost any real-world graph regardless of how complex it is. So, if you are struggling with your highly interconnected data being scattered all over the place, don’t hesitate to use a graph database and employ graph analytics and pattern matching to discover new knowledge. The one relational databases can’t even start thinking about, let alone provide. Use a knowledge graph to uncover fraudulent activities or find alternative actions that can help avoid risks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog?topics=Knowledge+Graphs&amp;amp;utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost&amp;amp;utm_content=banner#list" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0azgpsgm3wp9w5sd5wu.png" alt="Read more knowledge graphs on memgraph.com" width="800" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>mentorship</category>
      <category>career</category>
      <category>developers</category>
    </item>
    <item>
      <title>Understanding How Dynamic node2vec Works on Streaming Data</title>
      <dc:creator>Antonio Filipovic</dc:creator>
      <pubDate>Fri, 23 Dec 2022 14:08:48 +0000</pubDate>
      <link>https://dev.to/memgraph/understanding-how-dynamic-node2vec-works-on-streaming-data-9l9</link>
      <guid>https://dev.to/memgraph/understanding-how-dynamic-node2vec-works-on-streaming-data-9l9</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this article, we will try to explain how node embeddings can be updated and &lt;strong&gt;calculated dynamically&lt;/strong&gt;, which basically means as new edges arrive to the graph. If you still don't know anything about node embeddings, be sure to check out our blog post on the topic of &lt;a href="https://memgraph.com/blog/introduction-to-node-embedding?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;node embeddings&lt;/a&gt;. 📖&lt;/p&gt;

&lt;p&gt;There, we have explained what node embeddings are, where they can be applied, and why they perform so well. Even if you are familiar with everything mentioned, you can still &lt;a href="https://memgraph.com/blog/introduction-to-node-embedding?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;refresh your memory&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dynamic networks
&lt;/h2&gt;

&lt;p&gt;Many methods, like &lt;a href="https://arxiv.org/abs/1607.00653" rel="noopener noreferrer"&gt;&lt;strong&gt;node2vec&lt;/strong&gt;&lt;/a&gt; and &lt;a href="https://arxiv.org/abs/1403.6652" rel="noopener noreferrer"&gt;&lt;strong&gt;deepwalk&lt;/strong&gt;&lt;/a&gt;, focus on computing the embedding for static graphs which is great but also has a big drawback. Networks in practical applications are &lt;strong&gt;dynamic&lt;/strong&gt; and &lt;strong&gt;evolve constantly&lt;/strong&gt; over time. New links are formed, and old ones can disappear. Moreover, new nodes can be introduced into the graph (e.g., users can join the social network) and create new links toward existing nodes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How could one deal with such networks?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One idea could be to create a &lt;strong&gt;snapshot of a graph&lt;/strong&gt; when a new edge is created [&lt;a href="https://dl.acm.org/doi/abs/10.1145/1217299.1217301" rel="noopener noreferrer"&gt;Leskovec et al., 2007&lt;/a&gt;].  Naively applying static embedding algorithms to each snapshot leads to unsatisfactory performance due to the following challenges:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stability&lt;/strong&gt;: the embedding of graphs at consecutive time steps can differ substantially even though the graphs do not change much.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Growing Graphs&lt;/strong&gt;: All existing approaches assume a fixed number of nodes in learning graph embeddings and thus cannot handle growing graphs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt;: Learning embeddings independently for each snapshot leads to running time linear in the number of snapshots. As learning a single embedding is computationally expensive, the naive approach does not scale to dynamic networks with many snapshots&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Dynamic Node2vec
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dynamic node2vec&lt;/strong&gt; is a random-walk based method that creates embeddings for every new node added to the graph. For every new edge, there is a recalculation of probabilities (weights) that are used in walk sampling. A goal of the method is to enforce that the embedding of node &lt;code&gt;v&lt;/code&gt; is similar to the embedding of nodes with the ability to reach node &lt;code&gt;v&lt;/code&gt; across edges that appeared one before another. Don’t worry if this sounds confusing now, just remember that we have probability updates and walk sampling.&lt;/p&gt;

&lt;p&gt;Take a look at &lt;em&gt;Image 1&lt;/em&gt;. We sampled a walk as was mentioned before. By doing so, we created a list of nodes also known as &lt;strong&gt;temporal walk&lt;/strong&gt;. The temporal part will be explained in a few seconds. And of course, the embedding of a node should be similar to the embedding of nodes in its temporal neighborhood. The algorithm itself consists of the following three parts in that order:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;1) probabilities (weights) update&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;2) walk sampling&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;3) word2vec update&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-temporal-walk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-temporal-walk.png" alt="memgraph-tutorial-temporal-walk"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt; Image 1. Representation of a 
&lt;a href="https://appliednetsci.springeropen.com/articles/10.1007/s41109-019-0169-5" rel="noopener noreferrer"&gt;temporal walk&lt;/a&gt;
&lt;/center&gt;



&lt;p&gt;Few notes about terms we will use in the rest of the text. When a new directed edge is added to the graph it has &lt;strong&gt;&lt;code&gt;source&lt;/code&gt;&lt;/strong&gt; node and &lt;strong&gt;&lt;code&gt;target&lt;/code&gt;&lt;/strong&gt; node. &lt;strong&gt;Walk sampling&lt;/strong&gt; means creating a walk through a graph. Every node we visit during walk sampling is memorized in order of visit. A walk can be constructed in a forward fashion, meaning we choose one of the out-going edges of a current node. Or it can be constructed in a backward fashion, which means we choose from one of the incoming edges of a current node. In backward walk sampling by choosing one of the incoming edges of the current node, we move to the new node, &lt;strong&gt;&lt;code&gt;source&lt;/code&gt;&lt;/strong&gt; node of that edge. And we repeat the step. The process for walk sampling in a backward fashion can be seen in &lt;em&gt;Image 2&lt;/em&gt;. We start from node 9 and sampled walk looks as follows: &lt;strong&gt;9,8,5,2,1&lt;/strong&gt;.&lt;br&gt;
For example, when we were on the node with id 5, we could have chosen a different edge, which would take us to node 2, or node 4. &lt;strong&gt;Important:&lt;/strong&gt; we are still not looking at the &lt;strong&gt;edge appearance timestamp&lt;/strong&gt;, that is when has edge appeared in the graph.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-walk-sampling.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-walk-sampling.png" alt="memgraph-tutorial-walk-sampling"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt; Image 2. Illustration of walk sampling on a directed graph&lt;/center&gt;



&lt;p&gt;We will first try to explain &lt;strong&gt;walk sampling&lt;/strong&gt;, and then &lt;strong&gt;weight update&lt;/strong&gt;, although in the real process they are reversed. We first do weight update, then walk sampling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Walk sampling
&lt;/h3&gt;

&lt;p&gt;In &lt;strong&gt;dynamic node2vec&lt;/strong&gt; walks, sampling is done in &lt;strong&gt;time-dependent backward form&lt;/strong&gt;. Backword sampling was explained before, so it's important you have a basic understanding for the next section.&lt;/p&gt;

&lt;p&gt;"Temporal" sampling means that from all possible incoming edges we only consider those that appeared before the last edge in the walk. Take a look at &lt;em&gt;Image 3&lt;/em&gt; and assume that we are on node &lt;strong&gt;5&lt;/strong&gt;. Since the last edge we visited was between nodes 5 and 8 (the edge is from 5 to 8, but we went from 8 to 5) appeared in the graph before edges 3⟶5 and 4⟶5, we can't even consider taking them as the next step. The only option is edge 2⟶5.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-time-dependent-walk-sampling.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-time-dependent-walk-sampling.png" alt="memgraph-tutorial-time-dependent-walk-sampling"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt; Image 3. Time-dependent walk sampling &lt;/center&gt;



&lt;p&gt;Since one node can have multiple incoming edges when sampling a walk, we use &lt;strong&gt;probabilities&lt;/strong&gt; (weights) to determine which incoming edge of the node we will take next. It is like a biased coin flip. Take a look at &lt;em&gt;Image 1&lt;/em&gt;. From node &lt;code&gt;u&lt;/code&gt; you can take edge &lt;code&gt;t4&lt;/code&gt; or &lt;code&gt;t5&lt;/code&gt;. When creating a walk, we want to visit nodes that were more recently connected by a new edge. They are carrying &lt;strong&gt;more information&lt;/strong&gt;, therefore they have more importance to the graph than the old edges.&lt;/p&gt;

&lt;p&gt;These probabilities are computed &lt;strong&gt;after&lt;/strong&gt; the edge arrives and &lt;strong&gt;before&lt;/strong&gt; temporal walk sampling (before this step). Walk sampling can be stopped, at any point, but at the latest when we sample &lt;strong&gt;walk_length&lt;/strong&gt; nodes in the walk. &lt;strong&gt;walk_length&lt;/strong&gt; is a &lt;a href="https://en.wikipedia.org/wiki/Hyperparameter_(machine_learning)" rel="noopener noreferrer"&gt;hyperparameter&lt;/a&gt;. It is set before the algorithm is started and it determines the size of &lt;strong&gt;every&lt;/strong&gt; walk in the algorithm. Whether walk sampling will be stopped is determined by a biased coin flip and it depends on &lt;strong&gt;weight_of_current_node&lt;/strong&gt;. For more details, please feel free to check the &lt;a href="https://appliednetsci.springeropen.com/track/pdf/10.1007/s41109-019-0169-5.pdf" rel="noopener noreferrer"&gt;paper by Ferenc Béres&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Probabilities (weights) update
&lt;/h3&gt;

&lt;p&gt;The weight of the node and probability for walk sampling are like two sides of the same coin. They represent a sum over all temporal walks &lt;code&gt;z&lt;/code&gt; ending in node &lt;code&gt;v&lt;/code&gt; using edges arriving no later than the latest one of already sampled ones in the temporal walk. When the algorithm decides which edge to take next for temporal walk creation, it uses these computed weights (probabilities). Every time a new edge appears in the graph, these probabilities are updated just for &lt;strong&gt;two nodes&lt;/strong&gt;, &lt;code&gt;source&lt;/code&gt; and &lt;code&gt;target&lt;/code&gt; of the &lt;strong&gt;new edge&lt;/strong&gt;. In &lt;em&gt;Image 4&lt;/em&gt; you can take a look at how probability is updated. For node &lt;code&gt;u&lt;/code&gt; we need to check if there were already some walks sampled for it. If there were, make sure to update the weight of node (sum of walks ending in that node) by multiplying with time decayed factor &lt;em&gt;exp(−c · t(uv) − tᵤ)&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Here &lt;em&gt;tᵤ&lt;/em&gt; represents last time edge pointing to node &lt;em&gt;u&lt;/em&gt; appeared in graph, and &lt;em&gt;t(uv)&lt;/em&gt; represents time of current edge &lt;em&gt;u⟶v&lt;/em&gt;. Afterwards, for node &lt;em&gt;v&lt;/em&gt;  make sure to sum up the walks from node &lt;em&gt;u&lt;/em&gt;, and also walks ending in &lt;em&gt;v&lt;/em&gt;  multiplied with time decayed factor &lt;em&gt;exp(−c · t(uv) − tᵤ)&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This time &lt;strong&gt;decayed factor&lt;/strong&gt; is here to give more weight to nodes with &lt;strong&gt;fresher edges&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-walk-sampling-math.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-walk-sampling-math.png" alt="memgraph-tutorial-walk-sampling-math"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;center&gt; Image 4. Walk sampling by author from &lt;a href="https://appliednetsci.springeropen.com/articles/10.1007/s41109-019-0169-5" rel="noopener noreferrer"&gt;paper&lt;/a&gt;
&lt;/center&gt;

&lt;h3&gt;
  
  
  Word2Vec update
&lt;/h3&gt;

&lt;p&gt;This is the part where we optimize our node embeddings to be as similar as possible. We also make use of the &lt;a href="https://en.wikipedia.org/wiki/Word2vec" rel="noopener noreferrer"&gt;&lt;strong&gt;word2vec&lt;/strong&gt;&lt;/a&gt; method mentioned earlier.&lt;/p&gt;

&lt;p&gt;After walks sampling, we use these prepared temporal walks to make nodes more similar to those nodes in their temporal neighborhood. What does this mean? So, let's say that our maximum walk length &lt;code&gt;walk_length&lt;/code&gt; is set to 4, and the number of walks &lt;code&gt;walk_num&lt;/code&gt;  is set to 3. These hyperparameters can be found in our implementation of &lt;a href="https://github.com/memgraph/mage/blob/main/python/node2vec_online.py" rel="noopener noreferrer"&gt;Dynamic Node2Vec on Github&lt;/a&gt;. Let's imagine we sampled the following temporal walks for node &lt;code&gt;9&lt;/code&gt; in the graph on &lt;em&gt;Image 3&lt;/em&gt;: &lt;code&gt;[1,2,6,9], [1,2,5,9], [5,7,9]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Don't forget that this &lt;code&gt;walk_length&lt;/code&gt; is the maximum length. Walk length can also be shorter. Now we need to make our embedding of node &lt;code&gt;9&lt;/code&gt; most similar to nodes that appeared in that temporal walk. So in math terms, this would mean the following: we seek to optimize the &lt;strong&gt;objective function&lt;/strong&gt; which maximizes the logarithmic probability of &lt;strong&gt;observing a network neighborhood&lt;/strong&gt; &lt;em&gt;Nₛ(u)&lt;/em&gt; for node &lt;em&gt;u&lt;/em&gt; based on its feature representation (representation in embedded space). Also, we make some assumptions for the problem to be easier to imagine out. This is the formula:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-formula-1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-formula-1.png" alt="memgraph-tutorial-formula-1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here &lt;em&gt;Pr(Nₛ(u) | f(u))&lt;/em&gt; is a probability of observing neighborhood nodes of node &lt;em&gt;u&lt;/em&gt; with a condition that we are currently in embedded space in place of node &lt;em&gt;u&lt;/em&gt;. For example, if the embedded space of node &lt;em&gt;u&lt;/em&gt; is vector &lt;em&gt;[0.5, 0.6]&lt;/em&gt;, and imagine you are at that point, what is the &lt;strong&gt;likelihood&lt;/strong&gt; to observe neighborhood nodes of node &lt;em&gt;u&lt;/em&gt;. For each node (that's why there is summation), we want to make the probability as high as possible.&lt;/p&gt;

&lt;p&gt;It sounds complicated, but with the following assumptions, it will get easier to comprehend:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The &lt;strong&gt;first&lt;/strong&gt; one is that observing a neighborhood node is &lt;strong&gt;independent&lt;/strong&gt; of observing any other neighborhood node given its feature representation. For example, in &lt;em&gt;Image 3&lt;/em&gt;, there is a probability for us to observe nodes 6, 7, and 8 if we are on node 9. This is just a relaxation for optimization purposes used often in &lt;a href="https://towardsdatascience.com/maximum-likelihood-estimation-984af2dcfcac" rel="noopener noreferrer"&gt;machine learning&lt;/a&gt; which makes it easier for us to compute the above mentioned probability &lt;em&gt;Pr(Nₛ(u) | f(u))&lt;/em&gt;. Our probabilities (weights) from the chapter of &lt;strong&gt;Probabilities (weights) update&lt;/strong&gt; are already incorporated in walk sampling. This relaxation is here just for probability calculation.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;strong&gt;second&lt;/strong&gt; one is that observing the &lt;strong&gt;source node&lt;/strong&gt; and &lt;strong&gt;neighborhood node&lt;/strong&gt; in the &lt;strong&gt;graph&lt;/strong&gt; is the same as observing &lt;strong&gt;feature representation (embedding of the source node&lt;/strong&gt; and &lt;strong&gt;feature representation of the neighborhood node&lt;/strong&gt; in &lt;strong&gt;feature space&lt;/strong&gt;. Take a note that in the probability calculation &lt;em&gt;Pr(Nₛ(u)|f(u))&lt;/em&gt; we mentioned feature representation of node &lt;em&gt;f(u)&lt;/em&gt; and regular nodes, which sounds like comparing apples and oranges, but with the following relaxation, it should be the same. So for some neighborhood node &lt;em&gt;nᵢ&lt;/em&gt; in the neighborhood &lt;em&gt;Nₛ&lt;/em&gt; of node &lt;em&gt;u&lt;/em&gt; we could have the following:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-formula-2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-tutorial-formula-2.png" alt="memgraph-tutorial-formula-2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The exponential term ensures that the sum of probabilities of every neighborhood node of node &lt;em&gt;u&lt;/em&gt; is 1. This is also called a &lt;a href="https://towardsdatascience.com/softmax-function-simplified-714068bf8156" rel="noopener noreferrer"&gt;softmax function&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is our optimization problem. Now, we hope that you have an idea of what our goal is. &lt;br&gt;
Luckily for us, this is already implemented in a Python module called &lt;a href="https://radimrehurek.com/gensim/" rel="noopener noreferrer"&gt;&lt;strong&gt;gensim&lt;/strong&gt;&lt;/a&gt;. Yes, these guys are brilliant in natural language processing and we will make use of it. 🤝&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Dynamic node2vec&lt;/strong&gt; offers a good solution for growing networks. We learned where embeddings can be applied, we mentioned the drawbacks of static graph embedding algorithms and learned about the benefits of &lt;strong&gt;dynamic node2vec&lt;/strong&gt;. All that's left is to try it out yourself.&lt;/p&gt;

&lt;p&gt;Our team of engineers is currently tackling the problem of graph analytics algorithms on &lt;strong&gt;real-time data&lt;/strong&gt;. If you want to discuss how to apply &lt;strong&gt;online/streaming algorithms&lt;/strong&gt; on connected data, feel free to join our &lt;strong&gt;&lt;a href="https://memgr.ph/join-discord" rel="noopener noreferrer"&gt;Discord server&lt;/a&gt;&lt;/strong&gt; and message us.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAGE&lt;/strong&gt; shares his wisdom on a &lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage" rel="noopener noreferrer"&gt;&lt;strong&gt;Twitter&lt;/strong&gt; account&lt;/a&gt;. Get to know him better by clicking the follow button! 🐦&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Fdynamic-node2vec-on-streaming-data%2Fmemgraph-mage-twitter.jpg" alt="memgraph-labelrankt-tutorial-mage-twitter"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Last but not least, check out &lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;&lt;strong&gt;MAGE&lt;/strong&gt;&lt;/a&gt; and don’t hesitate to give a star ⭐ or contribute with new ideas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog?topics=Graph+Algorithms&amp;amp;utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost&amp;amp;utm_content=banner#list" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0azgpsgm3wp9w5sd5wu.png" alt="Read more about graph algorithms on memgraph.com"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Introduction to Node Embedding</title>
      <dc:creator>Antonio Filipovic</dc:creator>
      <pubDate>Wed, 14 Dec 2022 12:24:14 +0000</pubDate>
      <link>https://dev.to/memgraph/introduction-to-node-embedding-2ccg</link>
      <guid>https://dev.to/memgraph/introduction-to-node-embedding-2ccg</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this article, we will try to provide an explanation to the following questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What are &lt;strong&gt;node embeddings&lt;/strong&gt;?&lt;/li&gt;
&lt;li&gt;How to generate node embeddings?&lt;/li&gt;
&lt;li&gt;When can we even use node embeddings? &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a lot to cover in one article, but let's give it our best. Any prior knowledge of graphs or &lt;strong&gt;&lt;a href="https://searchenterpriseai.techtarget.com/definition/machine-learning-ML"&gt;machine learning&lt;/a&gt;&lt;/strong&gt; is not necessary, just a bonus.&lt;br&gt;
First, let's start with graphs. 🚀&lt;/p&gt;

&lt;h2&gt;
  
  
  Graphs
&lt;/h2&gt;

&lt;p&gt;Graphs consist of &lt;strong&gt;nodes&lt;/strong&gt; and &lt;strong&gt;edges&lt;/strong&gt; - connections between the nodes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--X0tcIRs6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/node-embeddings/memgraph-tutorial-graph-sketch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--X0tcIRs6--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/node-embeddings/memgraph-tutorial-graph-sketch.png" alt="memgraph-tutorial-graph-sketch" width="773" height="408"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt; Node and edge on a graph &lt;/center&gt;

&lt;p&gt;In social networks, nodes could represent users, and links between them could represent friendships. &lt;/p&gt;

&lt;p&gt;One interesting thing you can do with graphs is to predict which tweets on Twitter are from bots, and which are from organic users. How would we achieve that? Well, stick around and you will get an idea of how it can be done.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are node embeddings?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;So what does embedding mean, and why is it useful?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To &lt;strong&gt;embed&lt;/strong&gt;, per the English dictionary, means to fix something in a substance or solid object. With graphs, it would mean to map the whole graph in &lt;strong&gt;N-dimensional space&lt;/strong&gt;. Take a look at the example below. In this example, we mapped all the nodes in 2-dimensional space. Now, it should be obvious we have two clusters (or communities) in the graph. For us humans, it is easier to identify clusters in 2-dimensional space. In this example, it is also easy to spot clusters just from the graph layout, but imagine the graph having 1000 nodes - things aren't as straightforward anymore. &lt;/p&gt;

&lt;p&gt;Furthermore, for a computer, it is easier to work with node embeddings (&lt;strong&gt;vectors of numbers&lt;/strong&gt;), because it is easier to calculate how similar (close in space) 2 nodes are from embeddings in N-dimensional space than it would be to calculate from a graph only. On the other hand, there is no proper way how we could calculate the closeness of two nodes just from the graph. You could use something like the &lt;a href="https://memgraph.com/docs/memgraph/concepts/graph-algorithms/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost#breadth-first-search"&gt;&lt;strong&gt;shortest path algorithm&lt;/strong&gt;&lt;/a&gt;, but that itself is not representative enough. With vectors, it's easier. The most often used metric is called &lt;a href="https://www.geeksforgeeks.org/cosine-similarity/"&gt;&lt;strong&gt;cosine similarity&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now we have something a computer can work with:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xoK9M_wb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/node-embeddings/memgraph-tutorial-example-embedding.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xoK9M_wb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/node-embeddings/memgraph-tutorial-example-embedding.png" alt="memgraph-tutorial-example-embedding" width="880" height="355"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;Example of embedding&lt;/center&gt;

&lt;blockquote&gt;
&lt;p&gt;Now we know what embeddings are, but what do we use node embeddings for?&lt;/p&gt;


&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/Supervised_learning"&gt;&lt;strong&gt;Supervised machine learning&lt;/strong&gt;&lt;/a&gt; is a subset of machine learning where algorithms try to learn from data. Data is represented by &lt;strong&gt;input-output pairs&lt;/strong&gt;, i.e. [2] -&amp;gt; 2, [1] -&amp;gt; 1. Our model tries to learn from data in such a way that it maps inputs to the correct outputs. In our example ([2] -&amp;gt; 2, [1] -&amp;gt; 1) model would try to learn function y=x. Here, it would be pretty easy for the model to learn input-output mapping, but imagine a problem where a lot of different points from input space &lt;strong&gt;map to same output value&lt;/strong&gt;. That's why we can't directly apply a machine learning algorithm to our input-output pairs, but we first need to find a set of &lt;em&gt;informative&lt;/em&gt;, &lt;em&gt;discriminating&lt;/em&gt;, and &lt;em&gt;independent&lt;/em&gt; features amongst input data points. Finding such features is an often difficult task.&lt;/p&gt;

&lt;p&gt;In &lt;strong&gt;prediction&lt;/strong&gt; problems on &lt;strong&gt;networks&lt;/strong&gt;, we would need to do the same for the nodes and edges. A typical solution involves hand-engineering domain-specific features based on expert knowledge. Even if one discounts the tedious effort required for &lt;strong&gt;feature engineering&lt;/strong&gt;, such features are usually designed for specific tasks and do not generalize across different prediction tasks.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This sounds like a bad news. 👎&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We want our algorithm to be independent of the downstream prediction task and that the representations can be learned in a purely &lt;a href="https://en.wikipedia.org/wiki/Unsupervised_learning"&gt;&lt;strong&gt;unsupervised&lt;/strong&gt;&lt;/a&gt; way. This is where &lt;strong&gt;node embeddings&lt;/strong&gt; come into place.&lt;/p&gt;

&lt;p&gt;We will make our algorithm learn &lt;strong&gt;embeddings&lt;/strong&gt;, and after that, we can apply those embeddings in any of the following applications, one of which is Twitter bot detection. Let's dig in.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to generate node embeddings?
&lt;/h2&gt;

&lt;p&gt;Researches have divided these methods into three broad categories: &lt;br&gt;
1) &lt;strong&gt;Factorization based&lt;/strong&gt;&lt;br&gt;
2) &lt;strong&gt;Random Walk based&lt;/strong&gt;&lt;br&gt;
3) &lt;strong&gt;Deep Learning based&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Factorization based
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Factorization based&lt;/strong&gt; algorithms represent the connections between nodes in the form of a matrix and factorize this matrix to obtain the embedding. In one such method called &lt;a href="https://towardsdatascience.com/lle-locally-linear-embedding-a-nifty-way-to-reduce-dimensionality-in-python-ab5c38336107"&gt;Local Linear Embedding&lt;/a&gt;, there is the assumption that &lt;strong&gt;every node is a linear combination of its neighbors&lt;/strong&gt;, so the algorithm tries to represent the embedding of every node as a linear combination of its &lt;strong&gt;neighbors' embeddings&lt;/strong&gt;. It is like the example from high school where you need to represent one vector as a &lt;a href="https://www.mathbootcamps.com/linear-combinations-vectors/"&gt;linear combination of other two vectors&lt;/a&gt;. Only here, you can have multiple vectors, and they are much more complex. &lt;/p&gt;

&lt;h3&gt;
  
  
  2. Random walks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Random walk&lt;/strong&gt; based methods use a walk approach to generate (sample) network neighborhoods for nodes. For every node, we would generate its network neighborhood by choosing in some way (depends on the method, can be random or it can include probabilities) the next node of our walk. You can take a look at the picture below of how it would look like. &lt;/p&gt;

&lt;p&gt;The maximum walk length is determined before this process of &lt;strong&gt;walk sampling&lt;/strong&gt;, and for every node, we generate &lt;strong&gt;N&lt;/strong&gt; random walks. By doing so, we have created a &lt;strong&gt;network neighborhood&lt;/strong&gt; of a node. And now our goal would be to make a node &lt;strong&gt;as similar as possible to nodes in its network neighborhood&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--SrUkicAP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/node-embeddings/memgraph-tutorial-random-walk.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--SrUkicAP--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/node-embeddings/memgraph-tutorial-random-walk.jpg" alt="memgraph-tutorial-random-walk" width="750" height="750"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;Example of a graph walk with three steps&lt;/center&gt;

&lt;blockquote&gt;
&lt;p&gt;Again with this boring question, but why do this? &lt;/p&gt;


&lt;/blockquote&gt;

&lt;p&gt;Turns out this process was proven to be very good in another area called &lt;a href="https://en.wikipedia.org/wiki/Natural_language_processing"&gt;&lt;strong&gt;natural language processing&lt;/strong&gt;&lt;/a&gt; dealing with words/documents where you want to find similar words. For example, words &lt;em&gt;"intelligent"&lt;/em&gt; and "&lt;em&gt;smart&lt;/em&gt;" should be similar words. This method in natural language processing is called &lt;a href="https://towardsdatascience.com/introduction-to-word-embedding-and-word2vec-652d0c2060fa"&gt;&lt;strong&gt;word2vec&lt;/strong&gt;&lt;/a&gt;. Words that appear in a similar context (words before or after that word), should be similar. Thankfully, the same applies to nodes. Nodes that appear in a similar context (sampled walks) should be similar. &lt;strong&gt;Our process of walk sampling is used to create a dataset on which we will try to make node embeddings as similar as possible.&lt;/strong&gt; 🤯 And that is it. The dataset in &lt;strong&gt;word2vec&lt;/strong&gt; methods is every sentence of a document, and analogously for us, it is every sampled graph random walk.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Deep learning
&lt;/h3&gt;

&lt;p&gt;The growing research on &lt;strong&gt;deep learning&lt;/strong&gt; has led to the usage of deep neural network-based methods applied to graphs. With deep learning, it is easier to model non-linear structures, so &lt;strong&gt;deep autoencoders&lt;/strong&gt; have been used for dimensionality reduction. A few popular methods from this area are called &lt;a href="https://paperswithcode.com/paper/structural-deep-network-embedding"&gt;&lt;strong&gt;Structural Deep Network Embedding (SDNE)&lt;/strong&gt;&lt;/a&gt; and &lt;a href="https://paperswithcode.com/paper/deep-neural-networks-for-learning-graph"&gt;&lt;strong&gt;Deep Neural Networks for Learning Graph Representations (DNGR)&lt;/strong&gt;&lt;/a&gt; so feel free to check them out.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where can node embeddings be applied?
&lt;/h2&gt;

&lt;p&gt;We know that &lt;strong&gt;graphs&lt;/strong&gt; occur naturally in various real-world scenarios such as social networks (social sciences), word co-occurrence networks (linguistics), interaction networks (i.e. &lt;a href="https://memgraph.com/blog/identifying-essential-proteins?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost"&gt;Protein-Protein interactions in biology&lt;/a&gt;), and so on.  Modeling the interactions between entities as graphs have enabled researchers to understand the various networks in a systematic manner. For example, social networks have been used for applications like friendship or content recommendations, as well as for advertisement. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;But how are researchers modeling such interactions?&lt;br&gt;
You may already have the answer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By embedding a large graph in low dimensional space (a.k.a. &lt;strong&gt;node embeddings&lt;/strong&gt;). Embeddings have recently attracted significant interest due to their wide applications in areas such as graph visualization, link prediction, clustering, and node classification. It has been demonstrated that &lt;strong&gt;graph embedding is superior to alternatives&lt;/strong&gt; in many supervised learning tasks, such as node classification, link prediction, and graph reconstruction. Here is a chronological list of research papers where you can check them out: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dl.acm.org/doi/abs/10.1145/2488388.2488393"&gt;Distributed large-scale natural graph factorization, Ahmed et al., 2013&lt;/a&gt;; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dl.acm.org/doi/abs/10.1145/2623330.2623732"&gt;DeepWalk: online learning of social representations, Perozzi et al., 2014&lt;/a&gt;; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ojs.aaai.org/index.php/AAAI/article/view/10179"&gt;Deep Neural Networks for Learning Graph Representations, Cao, et al., 2015&lt;/a&gt;; &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dl.acm.org/doi/abs/10.1145/2736277.2741093"&gt;LINE: Large-scale Information Network Embedding, Tang, et al., 2015&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dl.acm.org/doi/abs/10.1145/2939672.2939754"&gt;node2vec: Scalable Feature Learning for Networks, Grover and Leskovec, 2016&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dl.acm.org/doi/abs/10.1145/2939672.2939751"&gt;Asymmetric Transitivity Preserving Graph Embedding, Ou et al., 2016&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Node classification
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Node classification&lt;/strong&gt; aims to determine the label of nodes (a.k.a. vertices) based on other labeled nodes and the topology of the network. Often in networks, only a fraction of nodes are labeled. In &lt;strong&gt;social networks&lt;/strong&gt;, labels may indicate interests, beliefs, or demographics, whereas the labels of entities in &lt;strong&gt;biology networks&lt;/strong&gt; may be based on functionality. For example, we have some data where researchers have painstakingly worked out the functional role of specific proteins in their system of interest and characterized details of their interaction partners and the pathways in which they function. But still, a lot of them haven't yet been worked out completely. With embeddings, we could try to &lt;strong&gt;predict missing labels&lt;/strong&gt; with high precision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Link prediction
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Link prediction&lt;/strong&gt; refers to the task of predicting missing links or links that are likely to occur in the future. For example in the &lt;strong&gt;Protein-Protein network&lt;/strong&gt;, where verifying the existence of links between nodes that are proteins requires costly experimental tests, link prediction might save you a lot of money so that you check only where you have a higher chance to guess correctly. &lt;/p&gt;

&lt;h3&gt;
  
  
  Clustering
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Clustering&lt;/strong&gt; is used to find subsets of similar nodes and group them; finally, &lt;strong&gt;visualization&lt;/strong&gt; helps in providing insights into the structure of the network.&lt;/p&gt;

&lt;p&gt;So back to our &lt;strong&gt;bot case&lt;/strong&gt;. One assumption could be made that bots have a small number of links to real users, because who would want to be friends with them, but they have a lot of links between them so that they appear as real users. Graph clustering or community detection come in place here. We want to find those clusters and remove bot users. This can be done with node embeddings, especially &lt;strong&gt;&lt;a href="https://memgraph.com/blog/online-node2vec-recommendation-system?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost"&gt;dynamic node embeddings&lt;/a&gt;&lt;/strong&gt;, where interactions are made every second. &lt;/p&gt;

&lt;p&gt;There is a cool article &lt;a href="https://dl.acm.org/doi/pdf/10.1145/2818717"&gt;The Rise of Social Bots&lt;/a&gt; in which you can read how &lt;strong&gt;bots&lt;/strong&gt; are used to affect and possibly manipulate the online debate about &lt;strong&gt;vaccination policy&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Phew. There was a lot to cover, but we succeeded somehow! Good job if you stayed with me until here! ❤️. Now, after we have covered the theory, you can check out some implementations of &lt;strong&gt;node embedding&lt;/strong&gt; algorithms, for &lt;a href="https://memgraph.com/docs/mage/query-modules/python/node2vec?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost"&gt;static&lt;/a&gt; and &lt;a href="https://memgraph.com/docs/mage/query-modules/python/node2vec-online?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost"&gt;dynamic graphs&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Our team of engineers is currently tackling the problem of graph analytics algorithms on &lt;strong&gt;real-time data&lt;/strong&gt;. If you want to discuss how to apply &lt;strong&gt;online/streaming algorithms&lt;/strong&gt; on connected data, feel free to join our &lt;strong&gt;&lt;a href="https://memgr.ph/join-discord"&gt;Discord server&lt;/a&gt;&lt;/strong&gt; and message us.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAGE&lt;/strong&gt; shares his wisdom on a &lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage"&gt;&lt;strong&gt;Twitter&lt;/strong&gt; channel&lt;/a&gt;. Get to know him better by following him 🐦&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9ot-Llaq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/node-embeddings/memgraph-tutorial-intro-to-node-embedding-twitter.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9ot-Llaq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/node-embeddings/memgraph-tutorial-intro-to-node-embedding-twitter.png" alt="memgraph-tutorial-intro-to-node-embedding-twitter" width="750" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Last but not least, check out &lt;a href="https://github.com/memgraph/mage"&gt;&lt;strong&gt;MAGE&lt;/strong&gt;&lt;/a&gt; and don't hesitate to give a star ⭐ or contribute with new ideas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog?topics=Graph+Algorithms&amp;amp;utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost&amp;amp;utm_content=banner#list"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NWKqzkwo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/external/memgraph-read-more-gradient-1200.png" alt="Read more about real-time analytics on memgraph.com" width="880" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Run Link Prediction or Node Classification Algorithms and Write Custom Procedures in C++ With Mage 1.4</title>
      <dc:creator>Antonio Filipovic</dc:creator>
      <pubDate>Wed, 07 Dec 2022 06:33:23 +0000</pubDate>
      <link>https://dev.to/memgraph/run-link-prediction-or-node-classification-algorithms-and-write-custom-procedures-in-c-with-mage-14-5903</link>
      <guid>https://dev.to/memgraph/run-link-prediction-or-node-classification-algorithms-and-write-custom-procedures-in-c-with-mage-14-5903</guid>
      <description>&lt;p&gt;In the new release of &lt;a href="https://memgraph.com/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;Memgraph’s&lt;/a&gt; open-source graph extension library &lt;a href="https://memgraph.com/mage/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;MAGE&lt;/a&gt;, we focused on supporting graph machine learning. MAGE 1.4 now enables you to classify graph nodes and predict new relationships using the &lt;strong&gt;node classification&lt;/strong&gt; and &lt;strong&gt;link prediction&lt;/strong&gt; algorithms. &lt;/p&gt;

&lt;p&gt;We also wanted to extend &lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;MAGE&lt;/a&gt; towards the C++ community even more and created the C++ API towards Memgraph database. Writing graph algorithms in C++ now comes close to working in Python since you don’t need to worry about handling memory and working with unnecessary interfaces.&lt;/p&gt;

&lt;p&gt;If you are also familiar with igraph library, you’ll be happy to hear that we integrated it into MAGE, and the newly integrated k-means algorithm will help you cluster your data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Link prediction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/docs/mage/query-modules/python/link-prediction-with-gnn/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;Link prediction&lt;/a&gt; tries to predict new relationships by &lt;strong&gt;generalizing on unseen nodes&lt;/strong&gt; at inference time. Inside the module, you can choose to work on link prediction using GraphSAGE or GAT. The module was integrated using &lt;a href="https://www.dgl.ai/" rel="noopener noreferrer"&gt;DGL&lt;/a&gt; implementation, and it supports a lot of different logging metrics, as well as storing models after a certain number of epochs.&lt;/p&gt;

&lt;p&gt;One example of what you can do with the link prediction algorithm is to recommend new services for customers by using a query similar to this one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;n:&lt;/span&gt;&lt;span class="n"&gt;Customer&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="py"&gt;id:&lt;/span&gt; &lt;span class="s2"&gt;"1658"&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="py"&gt;s:&lt;/span&gt;&lt;span class="n"&gt;Service&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;WITH&lt;/span&gt; &lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;services&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;link_prediction.recommended_vertex&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;services&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="ss"&gt;,&lt;/span&gt; &lt;span class="n"&gt;recommendation&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Node classification
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/docs/mage/query-modules/python/node-classification-with-gnn/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;Node classification&lt;/a&gt; determines the labeling of samples (represented as nodes) by looking at the labels of their neighbors. It is motivated by &lt;strong&gt;homophily&lt;/strong&gt;, which means &lt;strong&gt;"love of sameness”&lt;/strong&gt; based on the sociological theory that similar things will group. The following module supports different layer types, loading and storing models, and much more. &lt;/p&gt;

&lt;p&gt;With node classification, you can work on fraud prediction by using a query like the one below to determine if a certain user is a fraudster or not:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cypher"&gt;&lt;code&gt;&lt;span class="k"&gt;MATCH&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ss"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;:&lt;/span&gt;&lt;span class="m"&gt;1658&lt;/span&gt;&lt;span class="ss"&gt;})&lt;/span&gt; 
&lt;span class="k"&gt;CALL&lt;/span&gt; &lt;span class="n"&gt;node_classification.predict&lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt; &lt;span class="k"&gt;YIELD&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; 
&lt;span class="k"&gt;RETURN&lt;/span&gt; &lt;span class="n"&gt;predicted_class&lt;/span&gt;&lt;span class="ss"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  C++ API designed for humans
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/docs/memgraph/reference-guide/query-modules/api/cpp-api/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;The new C++ API&lt;/a&gt; is designed for humans, not robots. We followed best practices to reduce unnecessary cognitive load: the components have simple and consistent interfaces, common use cases require fewer user actions, and the API comes with developer guides and extensive documentation.&lt;/p&gt;

&lt;p&gt;Memory management is probably the main pain point in C++ development. The new C++ API automatically manages the memory used by graph data, saving you time that would otherwise be spent debugging and writing repetitive code.&lt;/p&gt;

&lt;h2&gt;
  
  
  igraph support is here
&lt;/h2&gt;

&lt;p&gt;Furthermore, the &lt;a href="https://memgraph.com/docs/mage/query-modules/python/igraphalg/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;igraphalg&lt;/a&gt; module provides a comprehensive set of thin wrappers around some of the algorithms present in the &lt;a href="https://igraph.org/" rel="noopener noreferrer"&gt;igraph&lt;/a&gt; package. The wrapper functions can create an igraph-compatible graph-like object that can stream the native database graph directly, significantly lowering memory usage. &lt;/p&gt;

&lt;p&gt;From this version, MAGE supports &lt;a href="https://memgraph.com/docs/mage/query-modules/python/nxalg/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;NetworkX&lt;/a&gt; integration, &lt;a href="https://memgraph.com/docs/mage/query-modules/cuda/cugraph/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;cuGraph&lt;/a&gt; to support graph algorithms on CUDA devices, and now igraph. Whether you need something else, feel free to drop us a comment on &lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  k-means clustering to group examples
&lt;/h2&gt;

&lt;p&gt;And last but not least, the k-means algorithm clusters given data by trying to separate samples in n groups of equal variance by minimizing the criterion known as within-the-cluster sum-of-squares. Find out more about this algorithm in the &lt;a href="https://memgraph.com/docs/mage/query-modules/python/kmeans/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;, &lt;/p&gt;

&lt;p&gt;You can use this algorithm when you already have embeddings, but clustering is missing. For example, feel free to combine it with &lt;a href="https://memgraph.com/docs/mage/query-modules/python/node2vec/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;Node2Vec&lt;/a&gt; or &lt;a href="https://memgraph.com/docs/mage/query-modules/python/node2vec-online/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;Node2Vec online&lt;/a&gt; version.&lt;/p&gt;

&lt;h2&gt;
  
  
  What next?
&lt;/h2&gt;

&lt;p&gt;If any of the new features are the one that will make your use case easier, &lt;a href="https://memgraph.com/download#memgraph-platform/?utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost" rel="noopener noreferrer"&gt;update MAGE&lt;/a&gt; to version &lt;strong&gt;1.4.&lt;/strong&gt; Feel free to leave a comment, report an issue or give us a star as support for our work on &lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;br&gt;
Also, we are always open for discussions and advice, drop them on our &lt;a href="https://discord.com/invite/memgraph" rel="noopener noreferrer"&gt;Discord Server&lt;/a&gt; and stay informed on everything graph-algorithm-related!&lt;/p&gt;

</description>
      <category>watercooler</category>
    </item>
  </channel>
</rss>
