In this Article you'll learn about recommendation system modeling approaches and implementations.
A recommendation system allows predicting the future preference list for a certain customer or user, and recommends the top preference for this user.
The goal of a recommendation system is to expose people to items that they will like
Examples include: which books would a customer prefer to buy on Amazon, which Netflix movie or series would a user watch next.
Matrix Factorization
Recommendation system algorithms use matrix factorization. This can be accomplished using techniques such as Singular Value Decomposition (SVD) and Alternating Least Squares (ALS), which you'll learn about in this section.
Implementing Recommender Systems with surprise
surprise is a library that is optimized to efficiently create recommendations. You'll get a chance to use this library to code up different implementations of collaborative filtering recommendation systems.
Applications of Recommendation Systems
Help in suggesting the merchants/items which a customer might be interested in after buying a product in a marketplace
Estimate profit & loss of many competing items and make recommendations to the customer (e.g. buying and selling stocks)
Based on the experience of the customer, recommend a customer centric or product centric offering
Enhance customer engagement by providing offers which can be highly appealing to the customer.
Recommendation Systems Approaches
There are two main types of recommendation systems: 1.unpersonalized
2.personalized.
we will focus on personalized recommendation systems because that's where data scientists can provide the most value to companies, but to start off, let's investigate some unpersonalized systems because they can be productive in their own right.
unpersonalized recommendations are based on the entire user pool, whatever item is the most popular at any given time would be recommended to you, even if it's something you are completely uninterested in.
An example of an unpersonalized recommendation would be on YouTube when it recommends the most viewed videos.
Personalized Recommendations
The general problem of personalized recommendation systems can be summarized as:
Given: The profile of the "active" user and possibly some situational context, i.e. user browsing a product or making a purchase etc.
Required: Creating a set of items, and a score for each recommendable item in that set
Profile:
User profile may contain past purchases, ratings in either implicit or explicit form, demographics and interest scores for item features.
The techniques below make use of different similarity metrics to determine how "similar" items are to one another. The most common similarity metrics are 1.Euclidean distance
2.cosine similarity
3.Pearson correlation
4.Jaccard index (useful with binary data)
Each one of these distance metrics has its advantages and disadvantages depending on the type of ratings you are using and the characteristics of your data.
Content-Based Recommenders
The Main Idea behind content-Based Recommenders is that If you like an item, you will also like "similar" items.
The advantage of a content-based recommender system is that it is a recommender system that gives the user a bit more information as to why they are seeing these recommendations.
A disadvantage of content-based recommender systems is that they often require manual or semi-manual tagging of each of products.
Collaborative Filtering Systems
Main Idea: If user A likes items 1, 2, 3 and 4 and user B likes items 1, 2 and 3, then it is highly likely that user B will also like item 4.
The key idea behind collaborative filtering is that similar users share similar interests and that users tend to like items that are similar to one another.
Collaborative Filtering with Singular Value Decomposition
**
Recommendation Systems apply IR (Information Retrieval techniques) to select some information relevant to a given user. Collaborative Filtering (CF) is currently the most widely used approach to build recommendation systems and uses the usersโ behavior in the form of user-item ratings for predictions.
**Collaborative Filtering
Predict a numerical value expressing the predicted score of an item for a user. The predicted value should be within the same scale that is used by all users for the rating (i.e. the number of stars or rating between 0-5)
Recommend a list of Top-N items that the active user will like the most based on the highest predicted ratings for the items that they have not yet seen
Model-Based Collaborative Filtering
Matrix Factorization models are based on the concept of the Latent Variable Model.
Latent variable models try to explain complex relationships between several variables by way of simple relationships between variables and underlying "latent" variables.
In the model-based recommendations, techniques like Latent Semantic Index (LSI), and the dimensionality reduction method Singular Value Decomposition (SVD) are typically combined to get rid of sparsity.
Let's look at how a recommendation problem can be translated into a matrix decomposition context. The idea behind such models is that the preferences of users can be determined by a small number of hidden factors. We can call these factors Embeddings.
SVD in Python
SciPy has a straightforward implementation of SVD to help us avoid all the complex steps of SVD. We can use the svds() function to decompose a matrix as shown below. We will use csc_matrix() to create a sparse matrix object.
In this example, consider ๐ด
as the utility matrix with users and products links.
After the decomposition ๐
will be the user features matrix, ฮฃ
will be the diagonal matrix of singular values (essentially weights), and ๐.๐
will be the movie features matrix.
๐
and ๐.๐
are orthogonal, and represent different things. ๐
represents how much users like each feature and ๐.๐
represents how relevant each feature is to each movie.
Now we can recreate the original ratings matrix by multiplying the three factors of the matrix together. Let's look at the exact values and then the rounded values to get an idea of what our ratings should be.
memory-based algorithms use all the data all the time to make predictions, whereas model-based algorithms use the data to learn/train a model which can later be used to make predictions.
*Implementing Recommendation Engines with Surpriseยถ
*
we'll be working with a dataset built-in to surprise called the jester dataset. This dataset contains jokes rated by users on a scale of -10 to 10 based off a user's perceived humor level for a given joke. Let's get recommending some jokes!
*Memory-Based Methods (Neighborhood-Based)
*
calculate the more simple neighborhood-based approaches.
Pearson correlation seems to have performed better than cosine similarity in the basic KNN model, we can go ahead and use Pearson correlation as our similarity metric of choice for future models. The next model we're going to try is KNN with Means . This is the same thing as the basic KNN model, except it takes into account the mean rating of each user or item depending on whether you are performing user-user or item-item similarities, respectively.
Model-Based Methods (Matrix Factorization)
when SVD is calculated for recommendation systems, it is preferred to be done with a modified version called "Funk's SVD" that only takes into account the rated values, ignoring whatever items have not been rated by users.
The optimal parameters are :
{'n_factors': 100, 'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}
This model performed worse than the others! In general, the advantages of matrix factorization starts to show itself when the size of the dataset becomes massive. At that point, the storage challenges increase for the memory-based models, and there is enough data for latent factors to become extremely apparent.
Making Predictions
let's access some basic functionality of surprise models to get predicted ratings for a given user and item.
You now should have an understanding of the basic considerations one should take note of when coding a recommendation system as well as how to implement them in different ways using surprise.
Thank you so much.Happy hacking
Willy Angole
Top comments (1)
Great Information, thank you for this