In the previous part of this series, we learnt what a recommender system is. We also learnt about some use cases of a recommender system, and how Amazon Personalize works. Here, we would be going through the steps required to set up Amazon Personalize.
Setting Up Amazon Personalize
To get started with Amazon Personalize, you need to have an Amazon Web Service (AWS) account. In case, you don't have an account, follow the instructions here to create an account.
Now that you've created your AWS account, you need to create an AWS Identity and Access Management (IAM) admin user. An IAM user is an entity created to represent a person or application interacting with an AWS. An IAM user with administrative permission has unrestricted access to AWS services in your account. To create an IAM user, follow the set of instructions here.
Before we can use Amazon Personalize, you have to set up permissions that allow your IAM user access to Amazon Personalize Console and API operations. To set up the IAM user policy, follow the instructions here.
You now have to create an IAM role for Amazon Personalize. An IAM role is an IAM identity that you create in your account that has specific permissions. An IAM role is similar to an IAM user, in that it is an AWS identity with permission policies that determine what the identity can and cannot do in AWS. However, an IAM role is not uniquely associated with one person, a role is intended to be assumable by anyone who needs it. To create the IAM Role, follow the instructions here.
Use these configurations while creating the IAM Role:
- For Choose the service that will use this role, choose
Personalize
. - For Attach permissions policies, either choose the IAM user policy you created previously, or choose
AmazonPersonalizeFullAccess
from the list of Amazon-managed policies.
After creating the IAM role for Amazon Personalize, you have to prepare your training data and upload it to Amazon S3. We would be using a Custom dataset group. Download the user-interactions file here.
You would have to format the user_interaction.csv
so that Amazon Personalize can understand the data. We have to change the header rows from user_id
, movie_id
and timestamp
to USER_ID
, ITEM_ID
, and TIMESTAMP
. You can manually enter it with an Excel file or use the Python Pandas package.
Follow these steps to change the header row using pandas
import pandas as pd
# Read user_interactions.csv into a Pandas Dataframe
df = pd.read_csv("user_interactions.csv", encoding='latin-1')
#Rename column name
df = df.rename(columns={"user_Id":"USER_ID","movie_Id": "ITEM_ID","timestamp": "TIMESTAMP"})
#Save dataframe into a new CSV file named interactions.csv
df.to_csv("interactions.csv", index=False)
We would have to upload our interaction.csv
file into an AWS S3 bucket. To create an S3 bucket and upload files to it follow the set of instructions here.
For Region, choose US West (Oregon) us-west-2
.
After uploading your file to Amazon S3, you have to set up permission that allows Amazon Personalize to access the Amazon S3 bucket. We would be doing two things to set up the permission:
- Attaching an Amazon S3 policy to your Amazon Personalize service role.
- Attaching an Amazon Personalize access policy to your Amazon S3 bucket.
To attach an Amazon S3 policy to your Amazon Personalize service role, do the following:
- Sign in to the IAM console.
- In the navigation pane, choose Policies, and choose Create Policy.
- Choose the JSON tab, and update the policy as follows. Replace the bucket name with the name of your S3 bucket.
{
"Version": "2012-10-17",
"Id": "PersonalizeS3BucketAccessPolicy",
"Statement": [
{
"Sid": "PersonalizeS3BucketAccessPolicy",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
}
]
}
- Choose Next: Tags. Optionally add any tags and choose Review.
- Give the policy a name.
- For Description, enter a short sentence describing this policy, for example,
Allow Amazon Personalize to access its Amazon S3 bucket
. - Choose the Create Policy.
- In the navigation pane, choose Roles, and choose the role you created for Amazon Personalize.
- For Permissions, choose Attach Policies.
- To display the policy in the list, type part of the policy name in the Filter policies filter box.
- Choose the check box next to the policy you created earlier in this procedure.
- Choose Attach policy.
Now you have to attach a bucket policy to the Amazon S3 bucket containing your data. This permission allows Amazon Personalize access to the S3 bucket.
To create a bucket policy:
- In the Buckets list, choose the name of the bucket that you want to create a bucket policy.
- Choose Permissions.
- Under Bucket Policy, choose Edit. This opens the Edit bucket policy page.
- On the Edit bucket policy page, replace the JSON in the Policy section, with the following:
{
"Version": "2012-10-17",
"Id": "PersonalizeS3BucketAccessPolicy",
"Statement": [
{
"Sid": "PersonalizeS3BucketAccessPolicy",
"Effect": "Allow",
"Principal": {
"Service": "personalize.amazonaws.com"
},
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
]
}
]
}
Replace bucket-name
with the name of the S3 bucket.
- Choose the Save Changes to save the policy.
Before starting with Amazon Personalize, you have to determine your use case. Determining your use case will help identify what recipe would be used to train our model. Recipes are Amazon Personalize algorithms that are prepared for specific use cases.
The available use cases for Amazon Personalize include:
- Recommending items for users using the
USER_PERSONALIZATION
recipe. - Ranking items for a user using the
PERSONALIZED_RANKING
recipe. - Recommending similar items using the
RELATED_ITEMS
recipe. - Getting user segments using the
USER_SEGMENTATION
recipes.
Our use cases are for recommending items to users and recommending similar items. To start with Amazon Personalize, we have to create a Custom Dataset group, to do this follow the set of instructions here.
After creating a new Dataset group, you would be directed to the Create interactions dataset page. Follow the following steps to create an interaction dataset and schema:
- Specify a name for the dataset group.
- Choose Create new schema.
- Enter a name to distinguish our schema.
- Paste the schema JSON in the Schema definition.
{
"type": "record",
"name": "Interactions",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "USER_ID",
"type": "string"
},
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "TIMESTAMP",
"type": "long"
}
],
"version": "1.0"
}
- Choose Create dataset and continue
Next, we have to import our interaction data into Amazon Personalize from Amazon S3. To do these, we would have to create an import job. Follow the steps here to create a data import job.
After creating the dataset import job, you will be redirected to the Dashboard Overview page.
On the create dataset row, you would see the Interaction data would have Pending status. Amazon Personalize would take about 5-15 mins to create the Interaction dataset.
While we wait for Amazon Personalize to create the Interaction dataset, we would import the Item dataset. Before we import our Item dataset into Personalize, we need to have our Item metadata. You can download the Item metadata file we would be using in this tutorial here. Click here, to learn more about creating an Item dataset.
After downloading our Item metadata we have to upload the file to our previously Amazon S3 bucket. To upload the Item metadata to S3:
- Open the Amazon S3 console.
- In the Buckets list, choose the bucket's name to which you want to upload your files to.
- Choose Upload.
- In the Upload window, drag and drop the item metadata file to the Upload window.
- Choose Upload at the bottom of the page to upload the file to our Amazon S3 bucket.
To import our Item dataset into Amazon Personalize;
- Select the Import Item data ** on the **Overview Dashboard.
- Enter a dataset name of your choice that will be easily distinguishable.
- Select Create new Schema.
- Enter the following JSON schema, in the Schema Definition.
{
"type": "record",
"name": "Items",
"namespace": "com.amazonaws.personalize.schema",
"fitelds": [
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "YEAR_RELEASED",
"type": "int"
},
{
"name": "GENRE",
"type": "string",
"categorical": true
},
{
"name": "ACTORS",
"type": "string",
"categorical": true
},
{
"name": "DIRECTOR",
"type": "string"
}
],
"version": "1.0"
}
Choose Next, you would be directed to the Configure dataset import job page.
Enter the dataset import job name the Configure dataset import job page. The dataset import job name must be between 1-63 characters and shouldn't contain space.
Enter the S3 location of your data. If you don't know the location of the file in the S3 bucket, go to the file location in the S3 console. Check the box of the file and choose the Copy S3 URI.
Paste the S3 URI into the S3 location box on the Configure Dataset import job page.
In the IAM Role section, choose Enter a custom IAM Role ARN. You have to enter the ARN of the IAM Role previously created. To get the ARN, go to the IAM console. Choose Role and choose the IAM role created earlier. On the IAM Role page, you would see ARN. Copy the ARN and paste it into the IAM Role ARN on the Configure dataset import job. Choose Start Import.
After importing data, we have to create a Solution. A Solution is a combination of Amazon Personalize recipe, customized parameters, and one or more solution versions (trained models) that Amazon Personalize uses to generate recommendations. Amazon Personalize provides recipes based on common use cases, for training models. Recipes are Amazon Personalize algorithms that are prepared for specific use cases. You can read more about Amazon Personalize recipes, here
To create the Solution we would be using, head over to the Amazon Personalize Overview page, and choose Create Solution.
On the Create Solution page. Enter a name for the solution. The solution name must have 1-63 characters with no spaces. (Valid characters: a-z, A-Z, 0-9, and _).
For Solution type, choose Item recommendation.
For Recipe, select aws-similar-items
. The aws-similar-items
or Similar-Items recipe generates recommendations for items that are similar to an item you specify.
Select Create and train solutions
. You would then be redirected to the Overview page.
After preparing and importing our data and creating our solutions, we are ready to deploy our solution version to generate recommendations in real-time. A solution version is a trained model that is created as part of a solution in Amazon Personalize. We can deploy our solution version by creating an Amazon Personalize campaign. A campaign is a deployed solution version (trained model) with provisioned dedicated transaction capacity for creating real-time recommendations for your application users. After you create a campaign, you use the GetRecommendations
or GetPersonalizedRanking
API operations to get recommendations.
Follow the instructions, here to create a campaign. This campaign will be used to get recommendations of items similar to a selected item.
In the next part of this series, we would creating a movie API and integrating Amazon Personalize into our API.
Top comments (0)