In order to make recommendations, the recommendation engines of today can no longer identify a connection between certain users, reviews and products exist. To truly make their mark in the market, companies need to have recommendation engines that analyze that data from every angle. A truly accurate and adaptable recommendation engine dissects those relationships to extract their significance, influence and weight.
Relationships are analyzed using recommendation algorithms. The most widely used recommendation algorithm is collaborative filtering - a method for automatically predicting (filtering) users' interests by gathering preferences or taste information from other users (collaborating). The collaborative filtering method's core premise is that if two people have the same view on a certain subject, they are more likely to have it on a different subject than two randomly selected people.
A collaborative filtering recommendation system for product preferences forecasts which products a user will enjoy based on a partial list of the user's interests (reviews).
But this algorithm that connects two or three dots within data, although very popular, is no longer good enough. People spend a lot of time and money researching algorithms that take into account data that could influence someone's purchase, such as their shopping habits, market trends, wishlist contents, recently viewed items, search history, reviews, platform activity, and many more.
Creating an algorithm to take so many variables into consideration isn’t an easy task. Especially in relational databases where creating connections between tables is expensive even with rudimentary algorithms such as collaborative filtering.
But data stored in graph databases is already connected with rich relationships. Graph algorithms use those relationships between nodes to extract key information and combine them to give precise recommendations. That is precisely why most of the algorithms used for recommendation engines have been designed especially for graphs. And the best thing is, the algorithms and their implementations are free to use, you only need to adapt them to your use case.
Some of the graph algorithms used in recommendation engines are Breadth-first search (BFS), PageRank, Community Detection, and Link Prediction. And if the recommendation should be highly time-sensitive, their use can be enhanced by using dynamic versions of algorithms. Dynamic algorithms do not recalculate all the values in the graph, from the first to the last node, as the standard algorithms do. They only recalculate those values affected by the changes in the data, thus shortening the computation time and expenses.
Breadth-first search (BFS)
In recommendation engines, the breadth-first search can be used to find all the products the user might be interested in buying based on what other people with a similar shopping history bought.
The algorithm is simple, it chooses a starting node and starts exploring all the nodes that are connected to it. Then, it moves the starting point to one of the explored nodes and finds all nodes that are connected to that node. Following that logic, it goes through all the connected nodes in a graph.
In recommendation engines, the algorithm would start with the user and find all the products the user bought. From those products, it would deepen the search to find all the other users that bought those same products. Then, it would find all the other products those users bought which are not connected to the target user, meaning the user didn’t buy those products.
Upon further filtering, defined by certain criteria seen in the history of the targeted user, products would be filtered out. For example, maybe the user never bought or search for any equipment for freshwater fishing, so it doesn’t make sense to recommend those products. Maybe the user prefers buying in bulk or when items are discounted. In the end, the recommendation engine recommends the items the user might be the most interested in buying.
PageRank
PageRank algorithm can be used to recommend the best fitting or currently trending product that the target user might be interested in buying. The recommendation is based on the number of times this product was bought and how reliable the users are that bought or reviewed the product. A reliable user is a user with a valid shopping history and reviews. An unreliable user is a fake customer buying to pump up selling numbers of specific products to make them appear desirable.
The algorithm calculates the importance of each node by counting how many nodes are pointing to it and what their PageRank (PR) value is. There are a few methods to calculate PR values, but the most used one is PageRank Power Iteration. PR values can be values between 0 and 1, and when all values on a graph are summed, the result is equal to 1. The essential premise of the algorithm is that the more important a node is, the more nodes will probably be connected to it.
In recommendation engines, the PageRank algorithm can be utilized to detect which products are currently trending or to find users who are the most influential, meaning things they buy are often bought by a lot of other users later on and incorporate that result in the final recommendation.
Community Detection Algorithms
Recommendation engines benefit from analyzing how users behave. According to that behavior, they can identify customers with similar behaviors and group them. Once a group of users with similar buying habits is identified, recommendations can be targeted based on the groups they belong to and the habits of that group.
Detecting groups of people, or communities, is done by using community detection graph algorithms. Graph communities are groups of nodes where nodes inside of the group have denser and better connections between themselves than with other nodes in the graph.
The most used community detection graph algorithms are Girvan-Newman, Louivan and Leiden. The Girvan-Newman algorithm detects communities by progressively removing edges from the original network. In each iteration, the edge with the highest edge betweenes, that is, the number of shortest paths between nodes that go through a specific edge is removed. Once the stopping criteria are met, the engine is left with densely-connected communities.
Louvain and Leiden algorithms are similar, but the Leiden algorithm is an improved version of the Louvain algorithm. Both algorithms detect communities by trying to group nodes to optimize modularity, that is, a measure that expresses the quality of the division of a graph into communities.
In the first iteration, each node is a community for itself, and the algorithm tries to merge communities together to improve modularity. The algorithm stops once the modularity can’t be improved between two iterations.
Any of these algorithms can be used in recommendation engines for detecting communities of users. A community can be a group of users that are buying products from the same categories or have similar buying habits or wishlists. Based on the community the customer belongs to, recommendations are given based on what others in the same community are buying.
Link Prediction
Graph neural networks (GNNs) have become very popular in the last few years. Every node in a graph can have a feature vector (embedding), which essentially describes that node with a vector of numbers. In standard neural networks, the connection to others is regarded only as a node feature.
However, GNNs can leverage both content information (user and product node features) as well as graph structure (user-product relationships), meaning they also consider node features that are in the neighborhood of the targeted node. To learn more about GNNs, check out this article and this video.
In the recommendation engine, GNNs can be used for Link Prediction tasks. Link prediction requires the GNN model to predict the relationship between two given nodes or predict the target node (or source node) given a source node (or target node) and a labeled relationship.
That would mean that the GNN model within the recommendation engine would be given two nodes as input. One node represents a customer, and the other represents a product. Based on the current graph structure and features of those two nodes, the model predicts if the customer will buy this product or not. The more active the user is, the more GNN model will learn about him and make better recommendations.
Dynamic algorithms
Data in recommendation engines is constantly being created, deleted and updated. At one point, the season for certain fish types starts, or a boat has proved faulty, causing an accident and people are no longer browsing, let alone considering buying products by that brand. Recommendation algorithms need to take into account every single change that happens in the market to make the best recommendation.
Standard algorithms we mentioned above need to recalculate all the values from the very first node on every or every few changes. This redundancy creates a bottleneck as it is very time-consuming and computationally expensive.
But graphs have a solution to this problem as well - dynamic graph algorithms. By introducing dynamic graph algorithms, computations are no longer executed on the entirety of the dataset. They compute the graph properties from the previous set of values.
Recommendation graph algorithms mentioned above, such as PageRank, Louvain and Leiden, have their dynamic counterparts which can be used in dynamic streaming environments with some local changes. With the dynamic version of algorithms, recommendations become time-sensitive and adapt to changes instantly.
Conclusion
The power behind any recommendation engine is the algorithms used to create recommendations. The most powerful recommendation algorithms are made especially for graph data. In this blog post, we covered more than five algorithms that calculate precise and effective recommendations. That’s five more reasons why your recommendation engine should be using a graph database instead of a relational one.
If you are already using a graph database, but some of these algorithms are new to you, we recommend you utilize them in your engine. No recommendation engine is necessary for that recommendation! It’s that clear!
Top comments (0)