Introduction
"Merry Christmas!" Words that we read countless times on social networks at the most wonderful time of the year. Christmas brings with it an unprecedented madness of social networks with posts of greetings, wishes, songs, pictures - everything related to Christmas and the holidays.
Twitter is no exception. Everything is ready for Christmas madness. To cheer you up, Memgraph MAGE has prepared a Christmas analysis. In such a dynamic environment, our goal is to find the most "Christmassy" person of all!
Model it till you make it
We must agree to the conditions by which we choose the most Christmassy person. Interactions on Twitter mostly take place through commenting and retweeting. Let us choose the latter as our choice.
If we search for โ#Christmasโ in the Twitter search engine, we will get thousands of different posts. It will be our base for a Christmas community. Our winner definitely needs to be a user from this community!
Then, if we take the retweeting of #Christmas tweets as an interaction, we can make a graph of the Christmas community in which users share Christmas announcements with each other. The graph is simple, precisely as it should be for the analysis. There is only one entity - USER
, which has a link to another if it retweeted its post.
The most Christmassy user
The Christmas spirit lasts through the holidays, and new data, in the form of retweets, arrives every moment. To deal with this we need a streaming platform for graph analysis. Memgraph will serve well, and as previously announced, MAGE has prepared a tool to calculate the winner.
PageRank is the perfect tool with which we can find the most Christmassy person. The person that is going to win is the one that got retweeted by other Christmassy people. As data comes at high speed, fortunately, MAGE has upgraded the good old PageRank in its garage and prepared a dynamic version - Dynamic PageRank.
We will use the dynamic implementation because of its advantages and adaptability to streaming data. Here are just a few:
- Changes are local - to obtain a new PageRank value, it is not necessary to execute the algorithm from scratch but only around the entities where the changes occurred
- The update is faster than a full re-run of a static algorithm
- Current data is saved and available in O(1) time.
Dynamic PageRank was implemented as part of the MAGE repository in Memgraph. After installing the needed libraries, it is necessary to set a trigger that will calculate the new value in a timely manner, upon the entities arrival in the graph.
CREATE TRIGGER pagerank_online
BEFORE COMMIT
EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges)
YIELD *
SET node.pagerank = rank;
CALL pagerank_online.set(100, 0.2) YIELD *;
This way, we ensure that proper data structures that keep the state of the algorithm are set, and newly added data will be recalculated and their pagerank
property will be set.
And the winner is...
A stream of retweets arrived in our network for several hours, and we waited to have a satisfactory amount of data. So after a few hours, and processed N retweets, the results are as follows:
We will use this method to retrieve PageRank properties from the data and then rank and limit them.
MATCH (n)
RETURN n, n.pagerank AS rank
ORDER BY rank DESC
LIMIT 10;
These people take Christmas seriously. We wish them all the best as well as you, the reader of this article. ๐๐ ๐
Conclusion
This article describes the idea of how a smart algorithm can spare a large number of computations and deliver valuable insights into data faster than before.
Our team of engineers is currently tackling the problem of graph analytics algorithms on real-time data. If you want to discuss how to apply online/streaming algorithms on connected data, feel free to join our Discord server and message us.
MAGE shares his wisdom on a Twitter account. Get to know him better by following him ๐ฆ
Last but not least, check out MAGE and donโt hesitate to give a star โญ or contribute with new ideas.
Top comments (0)