<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Akın</title>
    <description>The latest articles on DEV Community by Akın (@akin).</description>
    <link>https://dev.to/akin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F988948%2Fddcc6947-539a-40b3-a0e1-2f886034324b.jpg</url>
      <title>DEV Community: Akın</title>
      <link>https://dev.to/akin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/akin"/>
    <language>en</language>
    <item>
      <title>Building an Anime Recommendation System with PySpark in SageMaker</title>
      <dc:creator>Akın</dc:creator>
      <pubDate>Sun, 17 Mar 2024 10:35:27 +0000</pubDate>
      <link>https://dev.to/akin/building-an-anime-recommendation-system-with-pyspark-in-sagemaker-2kei</link>
      <guid>https://dev.to/akin/building-an-anime-recommendation-system-with-pyspark-in-sagemaker-2kei</guid>
      <description>&lt;h4&gt;
  
  
  Demonstration of an Anime Recommendation System with PySpark in SageMaker &lt;code&gt;v1.0.0-alpha01&lt;/code&gt;
&lt;/h4&gt;

&lt;h3&gt;
  
  
  Understanding the Dataset
&lt;/h3&gt;

&lt;p&gt;Our journey begins with understanding the dataset. We will be using the &lt;a href="https://www.kaggle.com/datasets/azathoth42/myanimelist?datasetId=28524&amp;amp;sortBy=voteCount"&gt;MyAnimeList dataset&lt;/a&gt; sourced from Kaggle, which contains valuable information about anime titles, user ratings, and more. This dataset will serve as the foundation for our recommendation system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Preprocessing the Data
&lt;/h3&gt;

&lt;p&gt;Before diving into model building, we will preprocess the dataset to ensure it is clean and structured for analysis. While the preprocessing steps have already been completed, we will briefly discuss the importance of data preprocessing in the context of recommendation systems.&lt;/p&gt;

&lt;p&gt;for detailed preprocessing check out this notebook&lt;br&gt;
&lt;a href="https://github.com/muratsahilli/pyspark-recommendation-system/blob/main/anime-recommendation-system.ipynb"&gt;anime-recommendation-system.ipynb&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;&lt;code&gt;TODO: Add all preprocessing step-by-step with explanations&lt;/code&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  Visualization
&lt;/h3&gt;

&lt;p&gt;Visualizing the preprocessed data can provide valuable insights into the distribution of anime ratings, user preferences, and other patterns. We will utilize various visualization techniques to gain a better understanding of our dataset.&lt;/p&gt;

&lt;p&gt;You can find data visualization techniques applied to better understand the content of the data in &lt;a href="https://github.com/muratsahilli/pyspark-recommendation-system/blob/main/anime-recommendation-system.ipynb"&gt;anime-recommendation-system.ipynb&lt;/a&gt;. You can find one of them below as example.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HecHWYA---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://github.com/muratsahilli/pyspark-recommendation-system/assets/61403011/93ba9a0f-84a4-4b48-b773-e378bec963c6" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HecHWYA---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://github.com/muratsahilli/pyspark-recommendation-system/assets/61403011/93ba9a0f-84a4-4b48-b773-e378bec963c6" alt="image" width="800" height="633"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;TODO: Add all visualizations step-by-step with explanations and insights&lt;/code&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  Recommendation System
&lt;/h3&gt;

&lt;p&gt;We employed the Alternating Least Squares (ALS) algorithm to build the recommendation system using PySpark. We used the mean square error algorithm to find the best model. We calculated how accurately it measured the real value by training the data allocated for the test and estimating the remaining data.&lt;/p&gt;

&lt;p&gt;Since these parameters had the lowest mean square error, we created a model with these parameters to obtain more accurate results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lambda_&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ALS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;train&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lambda_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lambda_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5047&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cloud Infrastructure Setup with Terraform
&lt;/h3&gt;

&lt;p&gt;First of all we need to upload preproocessed data to a s3 bucket. Use the following commands. You can reach to &lt;a href="https://github.com/muratsahilli/pyspark-recommendation-system/blob/main/sagemaker/upload_data.py"&gt;upload.py&lt;/a&gt; in there.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;sagemaker
&lt;span class="nv"&gt;$ &lt;/span&gt;python upload_data.py &lt;span class="nt"&gt;-n&lt;/span&gt; &lt;span class="s1"&gt;'anime-recommendation-system'&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'eu-central-1'&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; &lt;span class="s1"&gt;'../preprocessed_data'&lt;/span&gt;

../preprocessed_data&lt;span class="se"&gt;\u&lt;/span&gt;ser.csv  4579798 / 4579798.0  &lt;span class="o"&gt;(&lt;/span&gt;100.00%&lt;span class="o"&gt;)&lt;/span&gt;00%&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that run terraform commands to create an Amazon SageMaker Notebook Instance:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/muratsahilli/pyspark-recommendation-system/blob/main/sagemaker/instance.tf"&gt;instance.tf&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;terraform init
Initializing the backend...

Initializing provider plugins...
- Finding latest version of hashicorp/aws...
- Installing hashicorp/aws v5.41.0...
- Installed hashicorp/aws v5.41.0 &lt;span class="o"&gt;(&lt;/span&gt;signed by HashiCorp&lt;span class="o"&gt;)&lt;/span&gt;
...
&lt;span class="nv"&gt;$ &lt;/span&gt;terraform plan
...
Plan: 4 to add, 0 to change, 0 to destroy.
&lt;span class="nv"&gt;$ &lt;/span&gt;terraform apply &lt;span class="nt"&gt;--auto-approve&lt;/span&gt;
aws_iam_policy.sagemaker_s3_full_access: Creating...
aws_iam_role.sagemaker_role: Creating...
aws_iam_policy.sagemaker_s3_full_access: Creation &lt;span class="nb"&gt;complete &lt;/span&gt;after 1s &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;arn:aws:iam::749270828329:policy/SageMaker_S3FullAccessPoliciy]
aws_iam_role.sagemaker_role: Creation &lt;span class="nb"&gt;complete &lt;/span&gt;after 1s &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;AnimeRecommendation_SageMakerRole]
aws_iam_role_policy_attachment.sagemaker_s3_policy_attachment: Creating...
aws_sagemaker_notebook_instance.notebookinstance: Creating...
aws_iam_role_policy_attachment.sagemaker_s3_policy_attachment: Creation &lt;span class="nb"&gt;complete &lt;/span&gt;after 1s &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;AnimeRecommendation_SageMakerRole-20240317120654686300000001]
aws_sagemaker_notebook_instance.notebookinstance: Still creating... &lt;span class="o"&gt;[&lt;/span&gt;10s elapsed]
...
Apply &lt;span class="nb"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; Resources: 4 added, 0 changed, 0 destroyed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create a new conda_python3 notebook and Run &lt;a href="https://github.com/muratsahilli/pyspark-recommendation-system/blob/main/sagemaker/sagemaker-anime-recommendation-system.ipynb"&gt;&lt;code&gt;sagemaker-anime-recommendation-system.ipynb&lt;/code&gt;&lt;/a&gt; step by step on the Notebook Instance
&lt;/h3&gt;

&lt;p&gt;You can look at the &lt;code&gt;sagemaker-anime-recommendation-system-test.html&lt;/code&gt; file to see the output.&lt;/p&gt;

&lt;p&gt;In this code, model data is saved locally and then uploaded properly to an s3 bucket.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrCreate&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize S3 client
&lt;/span&gt;&lt;span class="n"&gt;s3&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Upload files to the created bucket
&lt;/span&gt;&lt;span class="n"&gt;bucketname&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anime-recommendation-system&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;local_directory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;./model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;destination&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dirs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;walk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;local_directory&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# construct the full local path
&lt;/span&gt;        &lt;span class="n"&gt;local_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;relative_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;relpath&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;local_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;local_directory&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;s3_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;destination&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;relative_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;s3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;local_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucketname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;s3_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--ku5EWNb4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://github.com/akinbezatoglu/pyspark-recommendation-system/assets/61403011/05b111f2-f12b-408a-b1c4-adbfd377a07b" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--ku5EWNb4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://github.com/akinbezatoglu/pyspark-recommendation-system/assets/61403011/05b111f2-f12b-408a-b1c4-adbfd377a07b" alt="image" width="800" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jyx05Uhq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://github.com/akinbezatoglu/pyspark-recommendation-system/assets/61403011/6a5a3dd6-2d5f-4aa1-88fb-a4469ccfc095" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jyx05Uhq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://github.com/akinbezatoglu/pyspark-recommendation-system/assets/61403011/6a5a3dd6-2d5f-4aa1-88fb-a4469ccfc095" alt="image" width="800" height="324"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  To load and test the model, run the &lt;a href="https://github.com/muratsahilli/pyspark-recommendation-system/blob/main/sagemaker/test_another_instance.ipynb"&gt;&lt;code&gt;test_another_instance.ipynb&lt;/code&gt;&lt;/a&gt; jupyter notebook
&lt;/h3&gt;

&lt;p&gt;In this code, model data is retrieved from the s3 bucket and loaded using the Matrix Factorization Model&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt; 

&lt;span class="n"&gt;s3_resource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;s3_resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anime-recommendation-system&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;objects&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;makedirs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dirname&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;obj&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.mllib.recommendation&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MatrixFactorizationModel&lt;/span&gt;

&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;MatrixFactorizationModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SparkContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrCreate&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As a result, I explained how to create the model by loading the model data into each separate environment, making the model easier to use, and processing it with SageMaker's notebook instance. See you in another article...👋👋&lt;/p&gt;

</description>
      <category>pyspark</category>
      <category>sagemarker</category>
      <category>aws</category>
      <category>demo</category>
    </item>
    <item>
      <title>My Huawei Cloud Practicum Experience</title>
      <dc:creator>Akın</dc:creator>
      <pubDate>Sat, 07 Jan 2023 18:13:18 +0000</pubDate>
      <link>https://dev.to/akin/my-huawei-cloud-practicum-experience-1n2c</link>
      <guid>https://dev.to/akin/my-huawei-cloud-practicum-experience-1n2c</guid>
      <description>&lt;p&gt;This article is about my experience in the &lt;a href="https://www.patika.dev/bootcamp/huawei-cloud-practicum"&gt;Huawei Cloud Practicum&lt;/a&gt; organized by &lt;a href="https://www.patika.dev/"&gt;Patika&lt;/a&gt;, sponsored by &lt;a href="https://www.huaweicloud.com/intl/en-us/"&gt;Huawei Cloud&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Intro&lt;/li&gt;
&lt;li&gt;First Case&lt;/li&gt;
&lt;li&gt;Homeworks&lt;/li&gt;
&lt;li&gt;Final Project&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Intro &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;I am &lt;a href="https://www.linkedin.com/in/akinbe/"&gt;Akın&lt;/a&gt;. I am developing myself in cloud computing. During this process, I wanted to add a hands-on experience to my learning process by participating in this practicum. In this blog post, I will describe to you what I did during this process.&lt;/p&gt;

&lt;p&gt;This practicum was a 6-week process. To participate in this process, the given case was asked to be done as desired.&lt;/p&gt;




&lt;h2&gt;
  
  
  First Case &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;We were asked to show how communication is provided by establishing peering connections between the VPCs. Since the case is simple, I created a challenge for myself. Around this time, I was starting to learn Terraform. That's why I wanted to add a hands-on experience to the learning process and develop the infrastructure as code with Terraform.&lt;/p&gt;

&lt;p&gt;My first job was to review the code samples on the web and in the &lt;a href="https://github.com/huaweicloud/terraform-provider-huaweicloud"&gt;terraform-provider-huaweicloud&lt;/a&gt; repository and understand the variables in Terraform. After a week, I completed the case. I brought together what I did during the process. You can reach my &lt;a href="https://developer.huaweicloud.com/intl/en-us/forum/topic/0251106413569389009"&gt;blog post&lt;/a&gt; and &lt;a href="https://github.com/akinbezatoglu/patika-hwc-practicum-case"&gt;terraform codes&lt;/a&gt; from these links.&lt;/p&gt;




&lt;h2&gt;
  
  
  Homeworks &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;During the first two weeks, we completed five exercises in &lt;a href="https://lab.huaweicloud.com/intl/en-us/?ticket=ST-8426209-z6QKLsBKKw6hbxHnMiRmpuaF-sso"&gt;Koolabs&lt;/a&gt;. With these exercises, we started to experience and learn Huawei Cloud services with a hands-on approach. In the third week, we did an exercise aiming to use the &lt;a href="https://www.huaweicloud.com/intl/en-us/product/cce.html"&gt;CCE&lt;/a&gt; (Kubernetes) cluster basic features. With this exercise, we experienced the CCE service hands-on.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Project &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;In the final project, I converted my monolithic survey management system application, which I had previously developed, into serverless architecture and tried to adapt it to the cloud. You can find my project codes &lt;a href="https://github.com/akinbezatoglu/survey-builder"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe899b7h6zugoqr5y14h7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe899b7h6zugoqr5y14h7.png" alt="architecture" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Creation of serverless functions&lt;/li&gt;
&lt;li&gt;Creating a pipeline with DevCloud's CloudPipeline&lt;/li&gt;
&lt;li&gt;Serverless backend with FunctionGraph&lt;/li&gt;
&lt;li&gt;Triggering functions with API Gateway&lt;/li&gt;
&lt;li&gt;Static website hosting with OBS&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  1. Creation of serverless functions &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;First, I created the internal application logic and serverless functions by separating the API handlers in the monolith app into &lt;a href="https://github.com/akinbezatoglu/survey-builder/tree/master/functions"&gt;events&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Structure of the project

survey-builder
 ┣ functions
 ┃ ┣ register
 ┃ ┃ ┗ main.go
 ┃ ┣ ...
 ┃ ┗ ...
 ┣ internal
 ┃ ┣ database
 ┃ ┣ go-runtime
 ┃ ┣ handler
 ┃ ┣ model
 ┃ ┗ service
 ┣ vendor
 ┣ go.mod
 ┗ go.sum
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Below is a serverless function example of register event. I handled APIG Trigger events using Huawei Cloud's FunctionGraph SDK. I used &lt;a href="https://www.huaweicloud.com/intl/en-us/product/gaussdbformongo.html"&gt;GaussDB for NoSQL&lt;/a&gt; for the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package main

import (
    "bytes"
    "encoding/base64"
    "encoding/json"
    "net/http"

    "huaweicloud.com/akinbe/survey-builder-app/internal/go-runtime/events/apig"
    "huaweicloud.com/akinbe/survey-builder-app/internal/go-runtime/go-api/context"
    "huaweicloud.com/akinbe/survey-builder-app/internal/go-runtime/pkg/runtime"
    "huaweicloud.com/akinbe/survey-builder-app/internal/handler"
)

func RegisterHandler(payload []byte, ctx context.RuntimeContext) (interface{}, error) {
    // Handle Apig Trigger Event Request
    var apigEvent apig.APIGTriggerEvent
    err := json.Unmarshal(payload, &amp;amp;apigEvent)
    if err != nil {
        apigResp := apig.APIGTriggerResponse{
            Body: err.Error(),
            Headers: map[string]string{
                "content-type": "application/json",
            },
            StatusCode: http.StatusBadRequest,
        }
        return apigResp, nil
    }

    // Parse the 'data' value from the trigger event, which takes the 'data' value
    // from the request sent by the client and carries it to the backend.
    // Then, decode the base64 encoded data.
    data, _ := base64.StdEncoding.DecodeString(apigEvent.PathParameters["data"])
    data_str := bytes.NewBuffer(data).String()

    // Event Logic
    viewuser, status, err := handler.UserSignupPostHandler(data_str)
    if err != nil {
        apigResp := apig.APIGTriggerResponse{
            Body: err.Error(),
            Headers: map[string]string{
                "content-type": "application/json",
            },
            StatusCode: status,
        }
        return apigResp, nil
    } else {
        user, _ := json.Marshal(viewuser)
        user_str := string(user)
        apigResp := apig.APIGTriggerResponse{
            Body: user_str,
            Headers: map[string]string{
                "content-type": "application/json",
            },
            StatusCode: http.StatusOK,
        }
        return apigResp, nil
    }
}

func main() {
    runtime.Register(RegisterHandler)
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  2. Creating a pipeline with DevCloud's CloudPipeline &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;With &lt;a href="https://www.huaweicloud.com/intl/en-us/product/devcloud.html"&gt;DevCloud&lt;/a&gt;'s Cloud Pipeline, I built the functions with docker and then transferred the images to &lt;a href="https://www.huaweicloud.com/intl/en-us/product/swr.html"&gt;SWR&lt;/a&gt;. With the container image option in FunctionGraph, I created the functions by pulling them from SWR. You can view the Dockerfile of the register event below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM golang:1.19-alpine AS builder

RUN apk add --update --no-cache gcc git build-base

WORKDIR /src
COPY . /src
RUN CGO_ENABLED=0 go build -o /bin/register

FROM scratch
COPY --from=builder /bin/register /bin/register
ENTRYPOINT ["/bin/register"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fap2y7ty2eiwxa2gv6scx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fap2y7ty2eiwxa2gv6scx.png" alt="docker-image-swr" width="800" height="49"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I tested the functions in &lt;a href="https://www.huaweicloud.com/intl/en-us/product/functiongraph.html"&gt;FunctionGraph&lt;/a&gt;. However, I resorted to another method because I was constantly getting the below error and could not find a solution.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;function invocation exception, error: CrashLoopBackOff: The application inside the container keeps crashing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I built and packaged the functions with the bash script below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#! /bin/bash

for dir in functions/*/; do
  # Extract the function name from the directory path
  function_name=$(basename "$dir")
  cd ${GOPATH}/src/huaweicloud.com/akinbe/survey-builder/functions/$function_name
  package="${function_name}_go1.x.zip"
  go build -o handler main.go
  zip $package handler
done

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With a Cloud Build task in DevCloud's Cloud Pipeline, I transferred the zip files to &lt;a href="https://www.huaweicloud.com/intl/en-us/product/obs.html"&gt;OBS&lt;/a&gt; (Object Storage Service). I created the functions in FunctionGraph by pulling the zip files from OBS. And then, I tested all functions successfully with APIG events.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Serverless backend with FunctionGraph &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Below is the test case of the register function. It returned an HTTP status of 500 because there is currently no database connection.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9o44350r4i48f23n6fdw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9o44350r4i48f23n6fdw.png" alt="apig-event-test" width="800" height="178"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Triggering serverless functions with API Gateway &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;In &lt;a href="https://www.huaweicloud.com/intl/en-us/product/apig.html"&gt;API Gateway&lt;/a&gt;, I created my APIs in an API Group where each endpoint corresponds to a function. I have configured the backend of the APIs with the functions in the Function Graph.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqumefdt4zcjgmi9mta4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqumefdt4zcjgmi9mta4.png" alt="apig-register-backend-configuration" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below is the test of the register event at the /api/v1/signup endpoint.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkns1aeingmxpdytpht1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkns1aeingmxpdytpht1.png" alt="apig-register-debug" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  5. OBS Static Web Site Hosting &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;I hosted my frontend files in OBS with static website hosting. However, I could not perform a full presentation since I received 405 not allowed responses from the POST requests sent from the client to the APIs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;As a result, I tried to implement a project with the Cloud first approach. I implemented a cloud-adapted project using the most appropriate services at every step. With its pros and cons, I tried to create an automated system in DevCloud's Cloud Pipeline, serverless architecture in FunctionGraph, and a gateway that serves backend logic to clients in API Gateway.&lt;/p&gt;

&lt;p&gt;In this process, I had the chance to get to know and use Huawei Cloud services hands-on. My thanks to everyone who made this process possible. &lt;br&gt;
See you in my next blog posts... 👋👋&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>huaweicloud</category>
      <category>showdev</category>
      <category>serverless</category>
    </item>
  </channel>
</rss>
