<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ante Pušić</title>
    <description>The latest articles on DEV Community by Ante Pušić (@antepusic).</description>
    <link>https://dev.to/antepusic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F977553%2F80dcad8b-c8f2-4e9a-8e54-47e9288160e6.png</url>
      <title>DEV Community: Ante Pušić</title>
      <link>https://dev.to/antepusic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/antepusic"/>
    <language>en</language>
    <item>
      <title>Analyze Infrastructure Networks With Dynamic Betweenness Centrality</title>
      <dc:creator>Ante Pušić</dc:creator>
      <pubDate>Wed, 22 Feb 2023 17:58:45 +0000</pubDate>
      <link>https://dev.to/memgraph/analyze-infrastructure-networks-with-dynamic-betweenness-centrality-2ad8</link>
      <guid>https://dev.to/memgraph/analyze-infrastructure-networks-with-dynamic-betweenness-centrality-2ad8</guid>
      <description>&lt;p&gt;Remember when undersea internet cables came under attack by... sharks?!&lt;/p&gt;

&lt;p&gt;In this blog post, you will explore how the loss of a connection can affect the global submarine internet cable network, and learn how to use &lt;strong&gt;dynamic betweenness centrality&lt;/strong&gt; to efficiently analyze &lt;strong&gt;streamed data&lt;/strong&gt; in Memgraph.&lt;/p&gt;

&lt;p&gt;The loss of a cable is a textbook example of how a single change can immediately disrupt the &lt;em&gt;entire&lt;/em&gt; network. To enable rapid response in such situations, our &lt;a href="https://memgraph.com/docs/mage/" rel="noopener noreferrer"&gt;&lt;strong&gt;MAGE&lt;/strong&gt;&lt;/a&gt; 🧙‍♂️ team has been adding &lt;em&gt;online&lt;/em&gt; graph algorithms (&lt;a href="https://memgraph.com/docs/mage/algorithms/dynamic-graph-analytics/node2vec-online-algorithm" rel="noopener noreferrer"&gt;&lt;strong&gt;Node2Vec&lt;/strong&gt;&lt;/a&gt;, &lt;a href="https://memgraph.com/docs/mage/algorithms/dynamic-graph-analytics/pagerank-online-algorithm" rel="noopener noreferrer"&gt;&lt;strong&gt;PageRank&lt;/strong&gt;&lt;/a&gt; &amp;amp; &lt;a href="https://memgraph.com/docs/mage/algorithms/dynamic-graph-analytics/community-detection-online-algorithm" rel="noopener noreferrer"&gt;&lt;strong&gt;community detection&lt;/strong&gt;&lt;/a&gt;), whose magic ✨ is in updating previous outputs instead of computing everything anew.&lt;/p&gt;

&lt;h2&gt;
  
  
  Explore the Global Shipping Network
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;In this tutorial, you will use Memgraph with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://hub.docker.com/r/memgraph/memgraph-platform" rel="noopener noreferrer"&gt;&lt;strong&gt;Docker&lt;/strong&gt;&lt;/a&gt;, &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pypi.org/project/gqlalchemy/" rel="noopener noreferrer"&gt;&lt;strong&gt;GQLAlchemy&lt;/strong&gt;&lt;/a&gt;,&lt;/li&gt;
&lt;li&gt;and &lt;a href="https://jupyter.org/install" rel="noopener noreferrer"&gt;&lt;strong&gt;Jupyter Notebook&lt;/strong&gt;&lt;/a&gt; installed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data
&lt;/h3&gt;

&lt;p&gt;The dataset used in this blogpost represents the global network of submarine internet cables in the form of a graph whose nodes stand for landing points, the cables connecting them represented as relationships. &lt;/p&gt;

&lt;p&gt;Landing points and cables have unique identifiers (&lt;code&gt;id&lt;/code&gt;), and the&lt;br&gt;
landing points also have names (&lt;code&gt;name&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;A giant thank you to &lt;a href="https://www2.telegeography.com/" rel="noopener noreferrer"&gt;TeleGeography&lt;/a&gt; for sharing the dataset for their &lt;a href="https://www.submarinecablemap.com" rel="noopener noreferrer"&gt;submarine cable map&lt;/a&gt;. ❤️&lt;/p&gt;
&lt;h3&gt;
  
  
  Exploration
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Our Setup
&lt;/h4&gt;

&lt;p&gt;With Docker installed, we will start Memgraph through it using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 7687:7687 &lt;span class="nt"&gt;-p&lt;/span&gt; 3000:3000 memgraph/memgraph-platform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We will be working with Memgraph from a Jupyter notebook. To interact with Memgraph from there, we use &lt;a href="https://pypi.org/project/gqlalchemy/" rel="noopener noreferrer"&gt;GQLAlchemy&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Betweenness Centrality
&lt;/h4&gt;

&lt;p&gt;Before we start exploring our graph, let’s quickly refresh that &lt;strong&gt;betweenness centrality&lt;/strong&gt; of a node is the fraction of shortest paths between all pairs pairs of nodes in the graph that pass through it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F40hm1ek2ty4lie9fvll0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F40hm1ek2ty4lie9fvll0.png" alt="memgraph-betweenness-centrality" width="212" height="73"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the above expression, &lt;em&gt;n&lt;/em&gt; is the node of interest, &lt;em&gt;i&lt;/em&gt;, &lt;em&gt;j&lt;/em&gt; are any two distinct nodes other than &lt;em&gt;n&lt;/em&gt;, and &lt;em&gt;σij(n)&lt;/em&gt; is number of shortest paths from &lt;em&gt;i&lt;/em&gt; to &lt;em&gt;j&lt;/em&gt; (going through &lt;em&gt;n&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;The analysis of (internet) traffic flows, like what we are doing here, is an established use case for this metric.&lt;/p&gt;

&lt;h3&gt;
  
  
  Jupyter notebook
&lt;/h3&gt;

&lt;p&gt;The Jupyter notebook is &lt;a href="https://github.com/memgraph/jupyter-memgraph-tutorials/tree/main/internet_infrastructure_analysis" rel="noopener noreferrer"&gt;&lt;strong&gt;here&lt;/strong&gt;&lt;/a&gt; – we can now go for a deep dive 🤿 in the data!&lt;/p&gt;

&lt;h4&gt;
  
  
  Preliminaries
&lt;/h4&gt;

&lt;p&gt;First, let’s connect to our instance of Memgraph with GQLAlchemy and load the dataset.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;gqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Memgraph&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;statement&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;statement&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memgraph&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Memgraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;7687&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# connect to running instance
&lt;/span&gt;&lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop_database&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;                  &lt;span class="c1"&gt;# make sure it’s empty
&lt;/span&gt;&lt;span class="nf"&gt;load_dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data/input.cyp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;            &lt;span class="c1"&gt;# load dataset
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;

&lt;p&gt;With everything set up, calling the &lt;code&gt;betweenness_centrality_online&lt;/code&gt; module is a matter of a single Cypher query. &lt;br&gt;
As we are analyzing how changes affect the undersea internet cable network, we save the computed betweenness centrality scores for later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    CALL betweenness_centrality_online.set()
    YIELD node, betweenness_centrality
    SET node.bc = betweenness_centrality;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s see which landing points have the highest betweenness centrality score in the network:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;most_central&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    MATCH (n: Node)
    RETURN n.id AS id, n.name AS name, n.bc AS bc_score
    ORDER BY bc_score DESC, name ASC
    LIMIT 5;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;most_central&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{'id': 15, 'name': 'Tuas, Singapore', 'bc_score': 0.29099145440251273}
{'id': 16, 'name': 'Fortaleza, Brazil', 'bc_score': 0.13807572163430684}
{'id': 467, 'name': 'Toucheng, Taiwan', 'bc_score': 0.13361801370831092}
{'id': 62, 'name': 'Manado, Indonesia', 'bc_score': 0.12915295031722301}
{'id': 123, 'name': 'Balboa, Panama', 'bc_score': 0.12783714460527598}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two of the above results, Singapore and Panama, have become international trade hubs owing to their favorable geographic position. They are highly influential nodes in other networks as well.&lt;/p&gt;

&lt;h4&gt;
  
  
  Dynamic Operation
&lt;/h4&gt;

&lt;p&gt;This part brings us to MAGE’s newest algorithm – &lt;strong&gt;iCentral&lt;/strong&gt; dynamic betweenness centrality by &lt;a href="http://www.fuadjamour.com/" rel="noopener noreferrer"&gt;Fuad Jamour&lt;/a&gt; and others.&lt;a href="https://repository.kaust.edu.sa/bitstream/handle/10754/625935/08070346.pdf" rel="noopener noreferrer"&gt;&lt;sup&gt;[1]&lt;/sup&gt;&lt;/a&gt;. &lt;br&gt;
After each graph update, iCentral can be run to update previously computed values without having to process the entire graph, going hand in hand with Memgraph’s &lt;strong&gt;graph streaming&lt;/strong&gt; capabilities.&lt;/p&gt;

&lt;p&gt;You can set this up in Memgraph with &lt;a href="https://memgraph.com/docs/memgraph/reference-guide/triggers" rel="noopener noreferrer"&gt;&lt;strong&gt;triggers&lt;/strong&gt;&lt;/a&gt; – pieces of Cypher code that run after database transactions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    CREATE TRIGGER update_bc 
    BEFORE COMMIT EXECUTE 
        CALL betweenness_centrality_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) 
        YIELD *;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffcgsnzkqfhdmk7clfvo0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffcgsnzkqfhdmk7clfvo0.png" alt="memgraph-shark-cuts-internet-cable" width="590" height="421"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s now see what happens when a shark (or something else) cuts off a submarine internet cable between Tuas in Singapore and Jeddah in Saudi Arabia.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;MATCH (n {name: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Tuas, Singapore&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;})-[e]-(m {name: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Jeddah, Saudi Arabia&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;}) DELETE e;&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The above transaction activates the &lt;code&gt;update_bc&lt;/code&gt; trigger, and previously computed betweenness centrality scores are updated using iCentral.&lt;/p&gt;

&lt;p&gt;With the cable out of function, internet data must be transmitted over different routes. Some nodes in the network are bound to experience increased strain and internet speed might thus deteriorate. These nodes likely saw their betweenness centrality score increase. To inspect that, we’ll retrieve the new scores with &lt;code&gt;betweenness_centrality_online.get()&lt;/code&gt; and compare them with the previously saved ones.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;highest_deltas&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    CALL betweenness_centrality_online.get()
    YIELD node, betweenness_centrality
    RETURN 
        node.id AS id,
        node.name AS name, 
        node.bc AS old_bc,
        betweenness_centrality AS bc,
        betweenness_centrality - node.bc AS delta
    ORDER BY abs(delta) DESC, name ASC
    LIMIT 5;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;highest_deltas&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DROP TRIGGER update_bc;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{'id': 111, 'name': 'Jeddah, Saudi Arabia', 'old_bc': 0.061933737931979434, 'bc': 0.004773934386713466, 'delta': -0.057159803545265966}
{'id': 352, 'name': 'Songkhla, Thailand', 'old_bc': 0.05259842296405675, 'bc': 0.07514804741735281, 'delta': 0.022549624453296065}
{'id': 15, 'name': 'Tuas, Singapore', 'old_bc': 0.29099145440251273, 'bc': 0.2730690696075149, 'delta': -0.017922384794997803}
{'id': 175, 'name': 'Yanbu, Saudi Arabia', 'old_bc': 0.0648358824682235, 'bc': 0.07561992914231867, 'delta': 0.010784046674095174}
{'id': 210, 'name': 'Dakar, Senegal', 'old_bc': 0.08708567541545133, 'bc': 0.09412362761485257, 'delta': 0.007037952199401246}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As seen above, the network landing point in Songkhla, Thailand had its score increase by &lt;strong&gt;42.87%&lt;/strong&gt; after the update. Conversely, other landing points became less connected to the rest of the network: the centrality of the Jeddah node in Saudi Arabia almost dropped to zero.&lt;/p&gt;

&lt;h4&gt;
  
  
  Performance
&lt;/h4&gt;

&lt;p&gt;iCentral builds upon the Brandes algorithm&lt;a href="https://pdodds.w3.uvm.edu/research/papers/others/2001/brandes2001a.pdf" rel="noopener noreferrer"&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/a&gt; and adds the following improvements in order to increase performance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Process only the nodes whose betweenness centrality values change&lt;/strong&gt;: after an update, betweenness centrality scores stay the same outside the affected &lt;a href="https://memgraph.com/docs/mage/algorithms/traditional-graph-analytics/biconnected-components-algorithm" rel="noopener noreferrer"&gt;&lt;strong&gt;biconnected component&lt;/strong&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid repeating shortest-path calculations&lt;/strong&gt;: use prior output if it’s possible to tell it’s still valid; if new shortest paths are needed, update the prior ones instead of recomputing.

&lt;ul&gt;
&lt;li&gt;Breadth-first search for computing graph dependencies does not need to be done out of nodes equidistant to both endpoints of the updated relationship.&lt;/li&gt;
&lt;li&gt;The BFS tree used for computing new graph dependencies (the contributions of a node to other nodes’ scores) can be determined from the tree obtained while computing old graph dependencies.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;bcc_partition&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memgraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_and_fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    CALL biconnected_components.get()
    YIELD bcc_id, node_from, node_to
    RETURN
        bcc_id,
        node_from.id AS from_id,
        node_from.name AS from_name,
        node_to.id AS to_id,
        node_to.name AS to_name
    LIMIT 15;
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;relationship&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bcc_partition&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Graphs of infrastructural networks, such as this one, fairly often consist of a number of smaller biconnected components (BCCs). As iCentral recognizes that betweenness centrality scores are unchanged outside the affected BCC, this can result in a significant speedup.&lt;/p&gt;

&lt;h4&gt;
  
  
  Algorithms: Online vs. Offline
&lt;/h4&gt;

&lt;p&gt;An important property of algorithms is whether they are &lt;em&gt;online&lt;/em&gt; or &lt;em&gt;offline&lt;/em&gt;. Online algorithms can update their output as more data becomes available, whereas offline algorithms have to redo the entire computation.&lt;/p&gt;

&lt;p&gt;The gold-standard offline algorithm for betweenness centrality is the one by Ulrik Brandes&lt;a href="https://pdodds.w3.uvm.edu/research/papers/others/2001/brandes2001a.pdf" rel="noopener noreferrer"&gt;&lt;sup&gt;[2]&lt;/sup&gt;&lt;/a&gt;: it works by building a shortest path tree from each node of the graph and efficiently counting the shortest paths through dynamic programming.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In &lt;a href="https://memgraph.com/blog/identifying-essential-proteins-betweenness-centrality" rel="noopener noreferrer"&gt;How to Identify Essential Proteins using Betweenness Centrality&lt;/a&gt; we built a web app to visualize protein-protein interaction networks with help of betweenness centrality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;However, we can easily see that updates often change only a tiny piece of the whole graph. Scalability means that one needs to take advantage of this by cutting down on repetition. To this end, we employed the fastest algorithm so far: &lt;strong&gt;iCentral&lt;/strong&gt;.&lt;br&gt;
Let’s see how it stacks up against the Brandes algorithm in complexity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Brandes: runs in &lt;em&gt;O&lt;/em&gt;(|&lt;em&gt;V&lt;/em&gt;||&lt;em&gt;E&lt;/em&gt;|) time and uses &lt;em&gt;O&lt;/em&gt;(|&lt;em&gt;V&lt;/em&gt;| + |&lt;em&gt;E&lt;/em&gt;|) space on a graph with |&lt;em&gt;V&lt;/em&gt;| nodes and |&lt;em&gt;E&lt;/em&gt;| relationships,&lt;/li&gt;
&lt;li&gt;iCentral: runs in &lt;em&gt;O&lt;/em&gt;(|&lt;em&gt;Q&lt;/em&gt;||&lt;em&gt;EBC&lt;/em&gt;|) time and uses &lt;em&gt;O&lt;/em&gt;(|&lt;em&gt;VBC&lt;/em&gt;| + |&lt;em&gt;EBC&lt;/em&gt;|) space. |&lt;em&gt;VBC&lt;/em&gt;| and |&lt;em&gt;EBC&lt;/em&gt;| are the counts of nodes and relationships in the affected portion of the graph; |&lt;em&gt;Q&lt;/em&gt;| ≤ |&lt;em&gt;VBC&lt;/em&gt;| (see the &lt;strong&gt;Performance&lt;/strong&gt; section for the |&lt;em&gt;Q&lt;/em&gt;| set).
NB: iCentral also saves time by avoiding repeated shortest-path calculations where possible; this varies by graph.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Another key trait of iCentral is that it can be run &lt;strong&gt;fully in parallel&lt;/strong&gt;, just like the Brandes algorithm. With &lt;em&gt;N&lt;/em&gt; parallel instances, this has the algorithm run &lt;em&gt;N&lt;/em&gt; times faster, at the expense of requiring &lt;em&gt;N&lt;/em&gt; times more space (each thread keeps a copy of the data structures). &lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;Betweenness centrality is a very common graph analytics tool, but it is nevertheless challenging to scale up to dynamic graphs. To solve this, Memgraph has implemented the fastest yet online algorithm for it – &lt;a href="https://github.com/memgraph/mage/tree/main/cpp/betweenness_centrality_module" rel="noopener noreferrer"&gt;&lt;strong&gt;iCentral&lt;/strong&gt;&lt;/a&gt;; it joins our growing suite of streaming graph analytics. &lt;/p&gt;

&lt;p&gt;Our R&amp;amp;D team is working hard on streaming graph ML and analytics. We’re happy to discuss it with you – ping us at &lt;a href="https://memgr.ph/join-discord" rel="noopener noreferrer"&gt;&lt;strong&gt;Discord&lt;/strong&gt;&lt;/a&gt;!&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;It’s time to build with Memgraph! 🔨&lt;/p&gt;

&lt;p&gt;Check out our &lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;&lt;strong&gt;MAGE&lt;/strong&gt;&lt;/a&gt; open-source graph analytics suite and don’t hesitate to give a star ⭐ or contribute with new ideas. If you have any questions or you want to share your work with the rest of the community, join our &lt;a href="https://memgr.ph/join-discord" rel="noopener noreferrer"&gt;&lt;strong&gt;Discord server&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1] Jamour, F., Skiadopoulos, S., &amp;amp; Kalnis, P. (2017). Parallel algorithm for incremental betweenness centrality on large graphs. &lt;em&gt;IEEE Transactions on Parallel and Distributed Systems&lt;/em&gt;, 29(3), 659-672.&lt;/p&gt;

&lt;p&gt;[2] Brandes, U. (2001). A faster algorithm for betweenness centrality. &lt;em&gt;Journal of Mathematical Sociology&lt;/em&gt;, 25(2), 163-177.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog?topics=Use+Cases&amp;amp;utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost&amp;amp;utm_content=banner#list" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0azgpsgm3wp9w5sd5wu.png" alt="Read more about graph use cases on memgraph.com" width="800" height="169"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>help</category>
      <category>career</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Monitoring a Dynamic Contact Network With Online Community Detection</title>
      <dc:creator>Ante Pušić</dc:creator>
      <pubDate>Mon, 06 Feb 2023 15:50:39 +0000</pubDate>
      <link>https://dev.to/memgraph/monitoring-a-dynamic-contact-network-with-online-community-detection-4kdo</link>
      <guid>https://dev.to/memgraph/monitoring-a-dynamic-contact-network-with-online-community-detection-4kdo</guid>
      <description>&lt;p&gt;This tutorial is a sequel to &lt;a href="https://memgraph.com/blog/labelrankt-community-detection-in-dynamic-environment"&gt;&lt;strong&gt;LabelRankT – Community Detection in Dynamic Environment&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The newest spell in MAGE’s book is &lt;a href="https://memgraph.com/blog/labelrankt-community-detection-in-dynamic-environment"&gt;&lt;strong&gt;Online Community Detection&lt;/strong&gt;&lt;/a&gt;. – an efficient, high-performance algorithm for detecting communities in networks that change with time.&lt;/p&gt;

&lt;p&gt;As MAGE wants to use his knowledge to help people, in this tutorial you will learn with him how to build a utility that monitors a dynamic contact network. The utility will a) use the detected communities to show rumor-spreading clusters and b) track the average cluster size.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How did our magician learn about all of this, though?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It was again winter – too soon, MAGE sighed – and something odd was happening with his sage 🧙 friends. They were arguing with one another, and between fights, they were wondering what was up.&lt;br&gt;
Soon they found out that rumors had somehow spread among them. Losing no time, MAGE set off to find a way to make sense of the situation with what he knows best – graphs. Having built a network that tracked their close contacts through time, he pored over his books until he found a suitable &lt;a href="https://arxiv.org/pdf/1305.2006.pdf"&gt;&lt;strong&gt;algorithm&lt;/strong&gt;&lt;/a&gt; for analyzing it. [1]&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To complete this tutorial, you will need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;a href="https://memgraph.com/docs/mage/installation"&gt;&lt;strong&gt;MAGE&lt;/strong&gt;&lt;/a&gt; – Memgraph’s very own graph analytics library&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/memgraph/gqlalchemy"&gt;&lt;strong&gt;gqlalchemy&lt;/strong&gt;&lt;/a&gt; – a Python driver and object graph mapper (OGM)&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Graph
&lt;/h2&gt;

&lt;p&gt;It is paramount to protect the privacy of personal contact records. Keeping this in mind, we generated a dynamic dataset (250 nodes and &amp;lt; 1700 vertices) of a network of close contacts using the &lt;a href="https://en.wikipedia.org/wiki/Barab%C3%A1si%E2%80%93Albert_model"&gt;Barabási–Albert graph generation model&lt;/a&gt;. This model is suitable for such networks because it makes graphs with two traits that one commonly sees in real-life networks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;small world property: few degrees of separation between any two nodes in the graph&lt;/li&gt;
&lt;li&gt;power-law degree distribution: the presence of &lt;em&gt;hubs&lt;/em&gt; (highly connected nodes) in the graph&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our graph consists of &lt;code&gt;Person&lt;/code&gt; nodes with a &lt;code&gt;contactID&lt;/code&gt; attached. If two &lt;code&gt;Persons&lt;/code&gt; have been in close contact, there exists between them a relationship &lt;code&gt;CONTACTED&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WK1oT_0D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/monitoring-dynamic-contact-network-with-online-community-detection/memgraph-tutorial-community-detection-graph-schema.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WK1oT_0D--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/monitoring-dynamic-contact-network-with-online-community-detection/memgraph-tutorial-community-detection-graph-schema.png" alt="memgraph-tutorial-contact-network-monitor-graph-schema" width="880" height="264"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;
&lt;center&gt;Image 1. Graph schema&lt;/center&gt;
&lt;br&gt;
&lt;br&gt;

&lt;p&gt;Contacts of age over the rumor’s “expiration date” do not play a role in rumor spreading and are thus dropped. Consequently, this means that communities of close contacts are not static; instead, they change with time, and communities need to be updated after each update anew.&lt;/p&gt;
&lt;h2&gt;
  
  
  Algorithm
&lt;/h2&gt;

&lt;p&gt;We will do our network analysis with the new &lt;strong&gt;LabelRankT&lt;/strong&gt; method described in the previous post. LabelRankT is an online algorithm that partitions the network into communities and returns nodes with their community labels.&lt;/p&gt;

&lt;p&gt;In a nutshell, the algorithm does two main operations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;label propagation: assign each node a label and pass them along edges for several iterations&lt;/li&gt;
&lt;li&gt;update: find the changed nodes and run label propagation on them only

&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now, why did MAGE use this algorithm?&lt;/p&gt;

&lt;p&gt;Let’s remember that we defined communities as sets of densely interconnected nodes. These are what our algorithm detects – as labels flow through the network, the sets quickly converge on a single one.&lt;/p&gt;

&lt;p&gt;As for our task, this idea allows us to extend the notion of “close contact” to people who haven’t had direct contact with a rumor spreader if they are sufficiently connected to the people who have. In other words, rumors can quickly spread through the network, much like a chain reaction would.&lt;br&gt;
&lt;u&gt;In effect, communities are essentially rumor-spreading clusters.&lt;/u&gt; This way, we can cast a wider net and inform more people about possible exposure to misinformation.&lt;/p&gt;
&lt;h2&gt;
  
  
  Loading the dataset
&lt;/h2&gt;

&lt;p&gt;The dataset is a file whose entries are structured as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+|-,contactID_1,contactID_2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the initial operator is &lt;code&gt;+&lt;/code&gt;, the entry adds a contact between &lt;code&gt;contactID_1&lt;/code&gt; and &lt;code&gt;contactID_2&lt;/code&gt; to the network; &lt;code&gt;-&lt;/code&gt; does the opposite.&lt;/p&gt;

&lt;p&gt;Memgraph supports graph streaming. When working with a dynamic network, one would usually create a stream and connect it and Memgraph. &lt;br&gt;
However, we’re focused on rumor spreading and working with streams is a bit beside the point. Instead, we’ll simulate a stream with this little snippet of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class Stream:
    i = 0
    data = [['+',120,954], ...]

    CREATE_EDGE = 'MERGE (a: Person {{contactID: {}}}) 
                   MERGE (b: Person {{id: {}}}) 
                   CREATE (a)-[r: CONTACTED]-&amp;gt;(b);'
    DELETE_EDGE = 'MATCH (a: Person {{contactID: {}}})-[r: CONTACTED]-&amp;gt;(a: Person {{contactID: {}}})
                   DELETE r;'

    def get_next(self):
        if self.i &amp;gt;= len(self.data):
            return None
        operation, node_1, node_2 = *self.data[self.i]
        self.i += 1
        if operation == '+':
            return self.CREATE_EDGE.format(node_1, node_2)
        elif operation == '-':
            return self.DELETE_EDGE.format(node_1, node_2)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Setting up detection
&lt;/h2&gt;

&lt;p&gt;We will need two procedures: &lt;a href=""&gt;&lt;strong&gt;&lt;code&gt;set()&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; initializes the community detection algorithm and &lt;a href=""&gt;&lt;strong&gt;&lt;code&gt;update()&lt;/code&gt;&lt;/strong&gt;&lt;/a&gt; gets the new communities after each graph change. &lt;/p&gt;

&lt;p&gt;Initialization with &lt;code&gt;set()&lt;/code&gt; should look like the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CALL dynamic_community_detection.set()
YIELD node, community_id
RETURN node.id AS node_id, community_id
ORDER BY node_id;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This method lets the user set the parameters of the algorithm, as detailed in the &lt;a href="https://memgraph.com/docs/mage/query-modules/cpp/community-detection-online#setdirected-weighted-similarity_threshold-exponent-min_value-weight_property-w_selfloop-max_iterations-max_updates"&gt;documentation&lt;/a&gt;. Note that we are using the default values in this article.&lt;/p&gt;

&lt;p&gt;&lt;a href=""&gt;&lt;strong&gt;Triggers&lt;/strong&gt;&lt;/a&gt; are a Memgraph functionality that lets users set openCypher statements to run in response to graph updates. We will handle community updating with a trigger that activates after every graph change:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TRIGGER test_edges_changed
BEFORE COMMIT EXECUTE
CALL dynamic_community_detection.update(
    createdVertices,
    createdEdges,
    updatedVertices,
    updatedEdges,
    deletedVertices,
    deletedEdges) YIELD *;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The arguments passed to &lt;code&gt;update()&lt;/code&gt; are predefined within the trigger. For a comprehensive list of trigger events and their predefined variables, take a look &lt;a href="https://memgraph.com/docs/memgraph/database-functionalities/triggers/#predefined-variables"&gt;&lt;strong&gt;here&lt;/strong&gt;&lt;/a&gt; in the documentation. &lt;/p&gt;

&lt;h2&gt;
  
  
  Putting it all together
&lt;/h2&gt;

&lt;p&gt;Finally, we’re going to put this all together. 🚀&lt;/p&gt;

&lt;p&gt;The main loop of our code holds the bulk of our utility’s logic. It handles the tasks of updating the graph, community detection, and returning rumor-spreading clusters and their mean sizes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
memgraph = Memgraph("127.0.0.1", 7687)

example_stream = Stream()
memgraph.execute(query=INIT_ALGORITHM)
memgraph.execute(query=UPDATE_TRIGGER)

while True:
    # update graph and run community detection

    next_change = example_stream.get_next()
    if not next_change:
        break
    memgraph.execute(query=next_change)
    results = memgraph.execute_and_fetch(query=GET_RESULTS)


    # for a random node, find its cluster

    nodes = []
    for result in results:
        node, _ = list(result.values())
        nodes.append(node)

    random_node = random.choice(nodes)
    cluster = []
    for result in results:
        node, community = list(result.values())
        if node == random_node:
            cluster.append(node)


    # calculate mean cluster size

    sizes = {}
    for result in results:
        node, community = list(result.values())
        if community not in sizes.keys():
            sizes[community] = 0
        else:
            sizes[community] += 1
    sizes = list(sizes.values())
    mean_size = sum(sizes) / len(sizes)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;To wrap up this demo, here’s a plot of mean cluster/community size by epoch:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JdqWwlPp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/monitoring-dynamic-contact-network-with-online-community-detection/memgraph-tutorial-community-detection-cluster-size.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JdqWwlPp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/monitoring-dynamic-contact-network-with-online-community-detection/memgraph-tutorial-community-detection-cluster-size.png" alt="memgraph-tutorial-community-detection-cluster-size" width="640" height="480"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;center&gt;Image 2. Cluster size increases as the graph fills out; larger sizes may imply quicker spread and vice versa. &lt;/center&gt;
&lt;br&gt;
&lt;br&gt;
&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This article aimed to present the efficiency and output quality of MAGE’s new online community detection algorithm on a dynamic network, stressing the insights that one can glean from the communities. With the ongoing rise in data streaming, the demand for algorithms that can handle large volumes of data and produce useful results is rising, and our algorithm is one of them.&lt;/p&gt;

&lt;p&gt;Our team of engineers is currently tackling the problem of graph analytics algorithms on &lt;strong&gt;real-time data&lt;/strong&gt;. If you want to discuss how to apply &lt;strong&gt;online/streaming algorithms&lt;/strong&gt; on connected data, feel free to join our &lt;strong&gt;&lt;a href="https://memgr.ph/join-discord"&gt;Discord server&lt;/a&gt;&lt;/strong&gt; and message us.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAGE&lt;/strong&gt; shares his wisdom on a &lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage"&gt;&lt;strong&gt;Twitter&lt;/strong&gt; account&lt;/a&gt;. Get to know him better by following him 🐦&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T8eHVj8u--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/labelrankt-community-detection-in-dynamic-environment/labelrankt-twitter-mage.jpg" alt="memgraph-labelrankt-tutorial-mage-twitter" width="880" height="237"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Last but not least, check out &lt;a href="https://github.com/memgraph/mage"&gt;&lt;strong&gt;MAGE&lt;/strong&gt;&lt;/a&gt; and don’t hesitate to give a star ⭐ or contribute with new ideas.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1] Jierui Xie, Mingming Chen, Boleslaw K. Szymanski: “LabelRankT: Incremental Community Detection in Dynamic Networks via Label Propagation”, 2013, Proc. DyNetMM 2013 at SIGMOD/PODS 2013, New York, NY, 2013; &lt;a href="http://arxiv.org/abs/1305.2006"&gt;arXiv:1305.2006&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog?topics=Graph+Algorithms&amp;amp;utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost&amp;amp;utm_content=banner#list"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--NWKqzkwo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://public-assets.memgraph.com/external/memgraph-read-more-gradient-1200.png" alt="Read more about graph algorithms on memgraph.com" width="880" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>LabelRankT – Community Detection in Dynamic Environment</title>
      <dc:creator>Ante Pušić</dc:creator>
      <pubDate>Thu, 02 Feb 2023 14:20:24 +0000</pubDate>
      <link>https://dev.to/memgraph/labelrankt-community-detection-in-dynamic-environment-2a4l</link>
      <guid>https://dev.to/memgraph/labelrankt-community-detection-in-dynamic-environment-2a4l</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Community detection&lt;/strong&gt; helps us uncover hidden relations among nodes in a graph and find sets of nodes with some characteristics in common. Real-life networks often show community structure, and by figuring it out, one can tackle a wide range of problems.&lt;/p&gt;

&lt;p&gt;If you’re doing graph analytics, the chances are that you have run community detection on the dataset. Algorithms take more time to run on large graphs, and handling the volume of work that comes along with a &lt;strong&gt;large&lt;/strong&gt; and &lt;strong&gt;frequently updated&lt;/strong&gt; dataset is an engineering problem. It makes sense to wonder if it’s possible to leverage the &lt;strong&gt;small size of an average update&lt;/strong&gt; to deliver a performance boost.&lt;/p&gt;

&lt;p&gt;We at Memgraph recognize your challenges. In this article, you will learn about the merits of &lt;em&gt;online&lt;/em&gt; community detection methods and get acquainted with the &lt;a href="https://arxiv.org/abs/1305.2006" rel="noopener noreferrer"&gt;LabelRankT&lt;/a&gt; algorithm by &lt;a href="https://arxiv.org/search/cs?searchtype=author&amp;amp;query=Xie%2C+J" rel="noopener noreferrer"&gt;Xie&lt;/a&gt; et al., now available in &lt;a href="https://github.com/memgraph/mage/releases/tag/v1.1" rel="noopener noreferrer"&gt;MAGE 1.1&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-labelrankt-network-illustration.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-labelrankt-network-illustration.png" alt="memgraph-tutorial-labelrankt-network-illustration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;Image 1. LabelRank network illustration&lt;/center&gt;

&lt;h2&gt;
  
  
  The case for online algorithms
&lt;/h2&gt;

&lt;p&gt;Picture a &lt;strong&gt;family&lt;/strong&gt; or a &lt;strong&gt;group of close friends&lt;/strong&gt; – they feel a strong sense of community to one another, and their mutual relationships run deeper than those with people outside the bunch. &lt;br&gt;
Now, let’s say that the family got a &lt;strong&gt;new arrival&lt;/strong&gt; or that a new friend joined the crew. These communities changed, but this doesn’t imply that others did so. In other words, the change was &lt;strong&gt;&lt;em&gt;local&lt;/em&gt;&lt;/strong&gt; in scope.&lt;/p&gt;

&lt;p&gt;Modern tech operations often work with big data, which is so &lt;em&gt;in more than one way&lt;/em&gt;: individual datasets both reach large sizes and receive frequent updates. Scalability means that one needs to cut down on repetition and leverage that updates often change just a tiny piece of the data. One needs &lt;strong&gt;online algorithms&lt;/strong&gt; – methods that process data as it comes.&lt;/p&gt;

&lt;p&gt;Community detection lends itself well to online solutions for two reasons: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Complexity&lt;/strong&gt;: many methods have high time complexity that scales with the number of nodes in the network&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Locality&lt;/strong&gt;: community changes tend to be local in scope after partial updates.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Graph streaming platforms such as Memgraph are natively dynamic environments, stressing the need for specialized algorithms on top of the usual performance considerations.&lt;br&gt;
To equip Memgraph with a suitable algorithm, the &lt;strong&gt;MAGE team&lt;/strong&gt; peered into &lt;del&gt;spellbooks&lt;/del&gt; academic research and implemented &lt;a href="https://arxiv.org/abs/1305.2006" rel="noopener noreferrer"&gt;&lt;strong&gt;LabelRankT&lt;/strong&gt;&lt;/a&gt; (Xie et al.).&lt;/p&gt;

&lt;h2&gt;
  
  
  The LabelRankT algorithm
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;LabelRankT&lt;/strong&gt; is an un-/semi-supervised machine learning algorithm made for &lt;strong&gt;online community detection&lt;/strong&gt; on networks. It takes a graph as input and returns an assignment of nodes to the communities it detected; nodes belong to a single community.&lt;br&gt;
Offline and online modes of operation are both supported; the latter takes advantage of the communities found in the previous iteration of the graph, whereas the former detects them &lt;em&gt;ab initio&lt;/em&gt;.&lt;br&gt;
The algorithm supports weighted and directed graphs by design.&lt;/p&gt;

&lt;h3&gt;
  
  
  Concepts
&lt;/h3&gt;

&lt;p&gt;In network science, we define a &lt;strong&gt;community&lt;/strong&gt; as a set of nodes that are densely connected internally. &lt;/p&gt;

&lt;p&gt;Label propagation algorithms initialize every node with a unique &lt;strong&gt;label&lt;/strong&gt;, and then these diffuse through the network. Consequently, densely interconnected groups – communities – quickly reach a common label.&lt;/p&gt;

&lt;p&gt;Nodes do not wholly belong to a single community under LabelRankT. Instead, every node has an associated &lt;strong&gt;label distribution vector&lt;/strong&gt; composed of &lt;strong&gt;label probabilities&lt;/strong&gt;. The vector is &lt;strong&gt;scaled&lt;/strong&gt; so that the probabilities sum to &lt;em&gt;1&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;self-loop&lt;/strong&gt; is added to each node to stabilize the detection results across iterations. &lt;/p&gt;

&lt;h3&gt;
  
  
  Detection
&lt;/h3&gt;

&lt;p&gt;The algorithm starts by assigning a &lt;strong&gt;unique label&lt;/strong&gt; to every node in the graph. &lt;br&gt;
Label distribution vectors are initialized as follows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-label-distribution-vector.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-label-distribution-vector.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In other words, the probability that node &lt;em&gt;i&lt;/em&gt; belongs to community &lt;em&gt;j&lt;/em&gt; is the ratio of &lt;em&gt;wⱼᵢ&lt;/em&gt; and the total weight of all edges that lead to &lt;em&gt;i&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Label propagation is an iterative process with four steps per iteration. Below follows an abridged description of the steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Node selection&lt;/strong&gt;: nodes belonging to the same community as at least &lt;em&gt;k&lt;/em&gt;\% of their neighbors skip the next three steps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Label propagation&lt;/strong&gt;:  for an individual node, its new label distribution is a weighted sum of its in-neighbors’ distributions (incl. its own):&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-label-propagation-formula.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-label-propagation-formula.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Inflation&lt;/strong&gt;: label distribution vectors are raised elementwise to the &lt;em&gt;n*th power and then scaled to sum to *1&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cutoff&lt;/strong&gt;: label probabilities under a set threshold are pruned.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-labelrankt-label-propagation.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-labelrankt-label-propagation.png" alt="memgraph-tutorial-labelrankt-label-propagation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;Image 2. Label propagation. Node 4 is not taken into account because the connecting edge 1→4 faces &lt;em&gt;away&lt;/em&gt; from node 1.&lt;/center&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
After several iterations, the algorithm converges on a solution and returns the most probable labels for each node. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Node selection&lt;/strong&gt; passes over nodes whose labels are too much like the neighbors’. In those cases, label propagation is likely not to change anything, and skipping it makes sense due to its expensiveness.&lt;br&gt;
The &lt;strong&gt;inflation&lt;/strong&gt; and &lt;strong&gt;cutoff&lt;/strong&gt; steps help optimize the algorithm by cutting down the number of labels stored in the nodes’ label distribution vectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Online operation
&lt;/h3&gt;

&lt;p&gt;Being an online algorithm, LabelRankT builds its solution incrementally as it adapts to changes in the input. &lt;strong&gt;Changed nodes&lt;/strong&gt; are defined as all nodes that have been added, edited, or deleted, as well as their neighbors. If an edge has been added, edited, or deleted, its endpoint nodes are also considered to be changed.&lt;/p&gt;

&lt;p&gt;Compared to the mathematical operations described in the previous section, the logic of online operation is simple:&lt;/p&gt;

&lt;p&gt;1) Find out which nodes are changed&lt;br&gt;
2) Preserve the community labels of unchanged nodes&lt;br&gt;
3) Re-run community detection on changed nodes only&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-labelrankt-changed-nodes.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Fmemgraph-tutorial-labelrankt-changed-nodes.png" alt="memgraph-tutorial-labelrankt-changed-nodes"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;center&gt;Image 3. After a node is deleted, its neighbors become changed nodes (red).&lt;/center&gt;



&lt;h2&gt;
  
  
  Performance
&lt;/h2&gt;

&lt;p&gt;LabelRankT runs in linear &lt;em&gt;O(m)&lt;/em&gt; time and takes &lt;em&gt;O(mn)&lt;/em&gt; space on a graph with &lt;em&gt;n&lt;/em&gt; nodes and &lt;em&gt;m&lt;/em&gt; edges. Total execution time also depends on the number of iterations set for label propagation.&lt;/p&gt;

&lt;p&gt;This section assesses the performance of LabelRankT on two dimensions: we compare offline community detection methods against offline ones in general and contrast LabelRankT’s properties with those of other community detection algorithms.&lt;/p&gt;

&lt;h3&gt;
  
  
  Online methods
&lt;/h3&gt;

&lt;p&gt;➕ much faster on large, dynamic graphs than offline methods &lt;br&gt;&lt;br&gt;
➕ adaptive to changes in the input data &lt;br&gt;&lt;/p&gt;

&lt;p&gt;➖ assumption that the changes only have local effects &lt;br&gt;&lt;br&gt;
➖ need to store past state(s) in memory&lt;/p&gt;

&lt;h3&gt;
  
  
  LabelRankT
&lt;/h3&gt;

&lt;p&gt;➕ result quality matching offline algorithms &lt;br&gt;&lt;br&gt;
➕ low computational cost &lt;br&gt;&lt;br&gt;
➕ scalable to very large graphs &lt;br&gt;&lt;br&gt;
➕ deterministic and replicable results (unlike other label propagation methods) &lt;br&gt;&lt;br&gt;
➕ compatible with directed and weighted graphs &lt;br&gt;&lt;br&gt;
➕ customizable parameters&lt;/p&gt;

&lt;p&gt;➖ less grounded in theory than statistical and modularity-maximizing algorithms for community detection&lt;/p&gt;

&lt;h2&gt;
  
  
  Applications
&lt;/h2&gt;

&lt;p&gt;Graphs that describe &lt;strong&gt;real-life networks&lt;/strong&gt; show community structure often. &lt;/p&gt;

&lt;p&gt;This insight applies to diverse use cases such as customer segmentation, contact tracing, medical diagnostics, and quantification of environmental &lt;strong&gt;hazards in public health&lt;/strong&gt;.&lt;br&gt;
As communities often correspond to the functional units of systems, additional use cases include the detection of cycles or pathways in &lt;strong&gt;metabolic networks&lt;/strong&gt; and &lt;strong&gt;recommender systems&lt;/strong&gt; where content forms communities sorted by topic.&lt;br&gt;
Furthermore, tracking the evolution of communities across time provides a way to monitor entities such as viruses or rumors in real-time as they spread. With the &lt;strong&gt;COVID-19 pandemic&lt;/strong&gt; being a top global concern, this problem is in search of a solution. &lt;/p&gt;

&lt;p&gt;In all cases, online methods are well-suited for frequently updated graphs as they save time by re-running the detection only on changed nodes.&lt;/p&gt;

&lt;p&gt;A key property of communities is that they often have very different properties than the rest of the network. If one knows the division of the graph into communities, it is possible to isolate them for closer study.&lt;br&gt;
Conversely, this can allow one to treat entire communities as &lt;strong&gt;single meta-nodes&lt;/strong&gt;, effectively reducing the size of the graph and allowing analysis with complex methods that otherwise wouldn’t scale – big data turned small!&lt;/p&gt;

&lt;p&gt;Communities detected by LabelRankT have a quality that matches traditional offline algorithms. Being &lt;strong&gt;deterministic&lt;/strong&gt;, the algorithm is also suitable for scientific applications that call for &lt;strong&gt;replicability&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Community detection is a common task in graph analytics owing to its wide variety of applications, but in big data, it faces concerns on the grounds of scalability in dynamic environments. Online methods such as LabelRankT help solve this by saving previously calculated results in unchanged graph regions.&lt;/p&gt;

&lt;p&gt;Our team of engineers is currently tackling the problem of graph analytics algorithms on &lt;strong&gt;real-time data&lt;/strong&gt;. If you want to discuss how to apply &lt;strong&gt;online/streaming algorithms&lt;/strong&gt; on connected data, feel free to join our &lt;strong&gt;&lt;a href="https://memgr.ph/join-discord" rel="noopener noreferrer"&gt;Discord server&lt;/a&gt;&lt;/strong&gt; and message us.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MAGE&lt;/strong&gt; shares his wisdom on a &lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage" rel="noopener noreferrer"&gt;&lt;strong&gt;Twitter&lt;/strong&gt; account&lt;/a&gt;. Get to know him better by following him 🐦&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/intent/follow?screen_name=memgraphmage" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fpublic-assets.memgraph.com%2Flabelrankt-community-detection-in-dynamic-environment%2Flabelrankt-twitter-mage.jpg" alt="memgraph-labelrankt-tutorial-mage-twitter"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Last but not least, check out &lt;a href="https://github.com/memgraph/mage" rel="noopener noreferrer"&gt;&lt;strong&gt;MAGE&lt;/strong&gt;&lt;/a&gt; and don’t hesitate to give a star ⭐ or contribute with new ideas.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://memgraph.com/blog?topics=Graph+Algorithm&amp;amp;utm_source=devto&amp;amp;utm_medium=referral&amp;amp;utm_campaign=blog_repost&amp;amp;utm_content=banner#list" rel="noopener noreferrer"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw0azgpsgm3wp9w5sd5wu.png" alt="Read more about graph algorithms on  memgraph.com"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
