<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: akoshel</title>
    <description>The latest articles on DEV Community by akoshel (@akoshel).</description>
    <link>https://dev.to/akoshel</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F950069%2Fcff84992-06b4-499b-b1a5-9b6a099a30f5.jpg</url>
      <title>DEV Community: akoshel</title>
      <link>https://dev.to/akoshel</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/akoshel"/>
    <language>en</language>
    <item>
      <title>Seamless Deployment of Hugging Face Models on AWS SageMaker with Terraform: A Comprehensive Guide</title>
      <dc:creator>akoshel</dc:creator>
      <pubDate>Sun, 18 Feb 2024 13:45:36 +0000</pubDate>
      <link>https://dev.to/akoshel/seamless-deployment-of-hugging-face-models-on-aws-sagemaker-with-terraform-a-comprehensive-guide-362g</link>
      <guid>https://dev.to/akoshel/seamless-deployment-of-hugging-face-models-on-aws-sagemaker-with-terraform-a-comprehensive-guide-362g</guid>
      <description>&lt;p&gt;When integrating Sagemaker with Hugging Face models using the default setup provided by the sagemaker-huggingface-inference-tollkit can be a good starting point. For a IaC setup, the terraform-aws-sagemaker-huggingface module is a handy resource, &lt;a href="https://github.com/philschmid/terraform-aws-sagemaker-huggingface/blob/master/main.tf"&gt;https://github.com/philschmid/terraform-aws-sagemaker-huggingface/blob/master/main.tf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, during my experience, I ran into a few issues with the Sagemaker-Huggingface-Inference-Toolkit:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment Flexibility:&lt;/strong&gt; The toolkit was limited to deploying only through the Python SDK, which was quite restrictive. (As it described in docs. Actually you can, for example using terraform module mentioned above)&lt;br&gt;
    &lt;strong&gt;Code and Model Packaging:&lt;/strong&gt; If you want to customize inference code it required storing the code with the model weights into a single tar file, which felt clunky. I prefer having the code as part of the image itself.&lt;br&gt;
    &lt;strong&gt;Custom Environments:&lt;/strong&gt; The sagemaker-huggingface-inference-toolkit doesn't allow for custom environment setups, like installing the latest Transformers directly from GitHub.&lt;/p&gt;

&lt;p&gt;One specific issue was the lack of support for setting torch_dtype to half precision for the pipelines, which was crucial for my project but not straightforward to implement.&lt;/p&gt;

&lt;p&gt;Given these limitations, I decided against rewriting everything to default sagemaker-inference-toolkit and instead explored a solution that just overrides get_pipline function in sagemaker-huggingface-inference-toolkit. Using following example you can customize in a any way you would like&lt;/p&gt;
&lt;h2&gt;
  
  
  How to Deploy
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Load model weights
&lt;/h3&gt;

&lt;p&gt;The first step is to put model weights to s3 bucket in model.tar.gz file. Instructions how to do it here &lt;a href="https://huggingface.co/docs/sagemaker/inference"&gt;https://huggingface.co/docs/sagemaker/inference&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Make entrypoint
&lt;/h3&gt;

&lt;p&gt;The deployment starts with setting up an entrypoint script. This script acts as the bridge between your model and Sagemaker, telling Sagemaker how to run your model. Here's a basic template I used:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sagemaker_huggingface_inference_toolkit&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;transformers_utils&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;serving&lt;/span&gt;




&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_kwargs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;torch_dtype&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;transformers_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_get_pipeline&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;serving&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Build image
&lt;/h3&gt;

&lt;p&gt;Next, you'll need to build a Docker image that Sagemaker can use to run your model. This involves starting with a basic transformers pytorch image (&lt;a href="https://github.com/huggingface/transformers/blob/main/docker/transformers-pytorch-gpu/Dockerfile"&gt;https://github.com/huggingface/transformers/blob/main/docker/transformers-pytorch-gpu/Dockerfile&lt;/a&gt;), than install sagemaker-huggingface-inference-toolkit with mms(multi model server), openjdk and congifure entrypoint.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; nvidia/cuda:12.1.0-cudnn8-devel-ubuntu20.04&lt;/span&gt;
&lt;span class="k"&gt;LABEL&lt;/span&gt;&lt;span class="s"&gt; maintainer="Hugging Face"&lt;/span&gt;

&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; DEBIAN_FRONTEND=noninteractive&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;apt update
&lt;span class="k"&gt;RUN &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
&lt;span class="k"&gt;RUN &lt;/span&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; pip

&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; REF=main&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;git clone https://github.com/huggingface/transformers &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;transformers &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; git checkout &lt;span class="nv"&gt;$REF&lt;/span&gt;

&lt;span class="c"&gt;# If set to nothing, will install the latest version&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; PYTORCH='1.13.1'&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; TORCH_VISION=''&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; TORCH_AUDIO=''&lt;/span&gt;
&lt;span class="c"&gt;# Example: `cu102`, `cu113`, etc.&lt;/span&gt;
&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; CUDA='cu121'&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="k"&gt;${#&lt;/span&gt;&lt;span class="nv"&gt;PYTORCH&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'torch=='&lt;/span&gt;&lt;span class="nv"&gt;$PYTORCH&lt;/span&gt;&lt;span class="s1"&gt;'.*'&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;  &lt;span class="nv"&gt;VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'torch'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; python3 &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; &lt;span class="nv"&gt;$VERSION&lt;/span&gt; &lt;span class="nt"&gt;--extra-index-url&lt;/span&gt; https://download.pytorch.org/whl/&lt;span class="nv"&gt;$CUDA&lt;/span&gt;
&lt;span class="c"&gt;# RUN [ ${#TORCH_VISION} -gt 0 ] &amp;amp;&amp;amp; VERSION='torchvision=='TORCH_VISION'.*' ||  VERSION='torchvision'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA&lt;/span&gt;
&lt;span class="c"&gt;# RUN [ ${#TORCH_AUDIO} -gt 0 ] &amp;amp;&amp;amp; VERSION='torchaudio=='TORCH_AUDIO'.*' ||  VERSION='torchaudio'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; ./transformers

&lt;span class="c"&gt;# When installing in editable mode, `transformers` is not recognized as a package.&lt;/span&gt;
&lt;span class="c"&gt;# this line must be added in order for python to be aware of transformers.&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;transformers &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; python3 setup.py develop


&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    openjdk-8-jdk-headless
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"sagemaker-huggingface-inference-toolkit[mms]"&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; ./entrypoint.py /usr/local/bin/entrypoint.py&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /usr/local/bin/entrypoint.py

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; /home/model-server/


&lt;span class="c"&gt;# Define an entrypoint script for the docker image&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["python3", "/usr/local/bin/entrypoint.py"]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, push your image to your ECR&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploy using terraform
&lt;/h3&gt;

&lt;p&gt;Finally, you'll use Terraform to deploy everything to AWS. This includes setting up the endpoint role, model, its endpoint configuration, and the endpoint itself. Here's a simplified version of what the Terraform setup might look like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_sagemaker_model"&lt;/span&gt; &lt;span class="s2"&gt;"customHuggingface"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"custom-huggingface"&lt;/span&gt;

  &lt;span class="nx"&gt;primary_container&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;image&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;YOUR_ACCOUNT&amp;gt;.dkr.ecr.&amp;lt;REGION&amp;gt;.amazonaws.com/&amp;lt;REPO&amp;gt;:&amp;lt;TAG&amp;gt;"&lt;/span&gt;
    &lt;span class="nx"&gt;model_data_url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s3://&amp;lt;BUKET&amp;gt;/&amp;lt;PATH&amp;gt;/model.tar.gz"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"assume_role"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sts:AssumeRole"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nx"&gt;principals&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;type&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Service"&lt;/span&gt;
      &lt;span class="nx"&gt;identifiers&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"sagemaker.amazonaws.com"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"yourRole"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;               &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"yourRole"&lt;/span&gt;
  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;assume_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy_document"&lt;/span&gt; &lt;span class="s2"&gt;"InferenceAcess"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"s3:GetObject"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:s3:::&amp;lt;yourBucket&amp;gt;/*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:GetAuthorizationToken"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:BatchCheckLayerAvailability"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:GetDownloadUrlForLayer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:GetRepositoryPolicy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:SetRepositoryPolicy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:DescribeRepositories"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:ListImages"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:DescribeImages"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:BatchGetImage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:GetLifecyclePolicy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:GetLifecyclePolicyPreview"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:ListTagsForResource"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:DescribeImageScanFindings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"ecr:InitiateLayerUpload"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&amp;lt;YOUR_ECR&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;statement&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;resources&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="nx"&gt;actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="s2"&gt;"cloudwatch:PutMetricData"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"logs:CreateLogStream"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"logs:PutLogEvents"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"logs:CreateLogGroup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="s2"&gt;"logs:DescribeLogStreams"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_policy"&lt;/span&gt; &lt;span class="s2"&gt;"InferenceAcess"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"InferenceAcess"&lt;/span&gt;
  &lt;span class="nx"&gt;policy&lt;/span&gt;      &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_policy_document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;InferenceAcess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy_attachment"&lt;/span&gt; &lt;span class="s2"&gt;"InferenceAcess"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt;       &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;yourRole&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
  &lt;span class="nx"&gt;policy_arn&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;InferenceAcess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_sagemaker_endpoint_configuration"&lt;/span&gt; &lt;span class="s2"&gt;"customHuggingface"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"customHuggingface"&lt;/span&gt;

  &lt;span class="nx"&gt;production_variants&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;variant_name&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"variant-1"&lt;/span&gt;
    &lt;span class="nx"&gt;model_name&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_sagemaker_model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customHuggingface&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
    &lt;span class="nx"&gt;initial_instance_count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="nx"&gt;instance_type&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"ml.g4dn.xlarge"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_sagemaker_endpoint"&lt;/span&gt; &lt;span class="s2"&gt;"customHuggingface"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;                 &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"customHuggingface"&lt;/span&gt;
  &lt;span class="nx"&gt;endpoint_config_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_sagemaker_endpoint_configuration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customHuggingface&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Invoke your endpoint
&lt;/h3&gt;

&lt;p&gt;After everything is deployed, you can test the endpoint with a simple request to make sure it's working as expected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;body = json.dumps({"inputs": &amp;lt;Your text&amp;gt;})
endpoint = "customHuggingface"
response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/json', Body=body)
response["Body"].read()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Useful links:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/philschmid/terraform-aws-sagemaker-huggingface/blob/master/main.tf"&gt;https://github.com/philschmid/terraform-aws-sagemaker-huggingface/blob/master/main.tf&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://huggingface.co/docs/sagemaker/inference"&gt;https://huggingface.co/docs/sagemaker/inference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/aws/sagemaker-huggingface-inference-toolkit"&gt;https://github.com/aws/sagemaker-huggingface-inference-toolkit&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html"&gt;https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>terraform</category>
      <category>huggingface</category>
      <category>aws</category>
      <category>sagemaker</category>
    </item>
    <item>
      <title>Optimize spark on kubernetes</title>
      <dc:creator>akoshel</dc:creator>
      <pubDate>Sat, 01 Apr 2023 07:50:15 +0000</pubDate>
      <link>https://dev.to/akoshel/optimize-spark-on-kubernetes-32la</link>
      <guid>https://dev.to/akoshel/optimize-spark-on-kubernetes-32la</guid>
      <description>&lt;p&gt;This is my second post about Spark on Kubernetes. I wanted to share my experience with reducing the costs of Spark computation in clouds, which can be expensive, but can be decreased by 60-70%. I am using Spark version 3.3.1.&lt;/p&gt;

&lt;p&gt;'1. If you are running your research in client mode from iPython notebook, it is recommended to &lt;strong&gt;use dynamic allocation&lt;/strong&gt;. This configuration allows you to create an executor pod only during compute time, after which the executor stops.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spark.dynamicAllocation.enabled                     true
spark.dynamicAllocation.shuffleTracking.enabled     true
spark.dynamicAllocation.shuffleTracking.timeout     120
spark.dynamicAllocation.minExecutors                0
spark.dynamicAllocation.maxExecutors                10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;'2. &lt;strong&gt;Using spot nodes for executors&lt;/strong&gt; significantly reduce costs (60-90% cheaper than on-demand nodes). To create a spot node group, you need to label it, for example, spark: spot. However, for driver still on-demand nodes should be used.&lt;/p&gt;

&lt;p&gt;If you are running in client mode, set the following configuration&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spark.kubernetes.executor.node.selector.spark      spot  # here you label k,v in my case k=spark, v=node
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you are using Spark Operator, use the following configuration settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spec:
  driver:
    nodeSelector:
      - key1: value1
      - key2: value2
  executor:
    nodeSelector:
      - key1: value1
      - key2: value2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;P.S use volume mount from next point to keep executors temp results is case of spot node interruption &lt;/p&gt;

&lt;p&gt;'3. &lt;strong&gt;Use SSD volume mount to executors.&lt;/strong&gt; As mentioned above to keep executor temp results in case of spot node interruption. For this purpose, it is best to use an SSD volume mount, which accelerates the write and read of temp files that Spark saves on disk. You can use the following configuration settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.claimName    OnDemand
spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.storageClass    gp # your cloud ssd storage class
spark.kubernetes.executor.volumes.persistentVolumeClaim.data.options.sizeLimit    100Gi
spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.path    /data
spark.kubernetes.executor.volumes.persistentVolumeClaim.data.mount.readOnly    false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;'4. These are the recommended default values from "Learning Spark":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;spark.shuffle.file.buffer                           1m
spark.file.transferTo                               false
spark.shuffle.unsafe.file.output.buffer             1m
spark.io.compression.lz4.blockSize                  512k
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;p&gt;In conclusion, by following the above steps, you can significantly reduce the cost of running Spark computations in the cloud. Dynamic allocation, using spot nodes for executors, and SSD volume mounts can reduce costs by up to 60-90%. Additionally, using default values as recommended in "Learning Spark" can help optimize performance. Remember to always prioritize the needs and satisfaction of the user when making any changes and to thoroughly test any configurations before implementing them. By doing so, you can provide a useful and enjoyable experience for your users while also being cost-effective.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
Recources:&lt;br&gt;
&lt;a href="https://spot.io/blog/how-to-run-spark-on-kubernetes-reliably-on-spot-instances/"&gt;https://spot.io/blog/how-to-run-spark-on-kubernetes-reliably-on-spot-instances/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/running-cost-optimized-spark-workloads-on-kubernetes-using-ec2-spot-instances/"&gt;https://aws.amazon.com/blogs/compute/running-cost-optimized-spark-workloads-on-kubernetes-using-ec2-spot-instances/&lt;/a&gt;&lt;br&gt;
&lt;a href="https://spark.apache.org/docs/latest/running-on-kubernetes.html"&gt;https://spark.apache.org/docs/latest/running-on-kubernetes.html&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/"&gt;https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;br&gt;&lt;br&gt;
P.S. My first post about spark on k8s&lt;br&gt;
How to run Spark on kubernetes in jupyterhub&lt;br&gt;
&lt;a href="https://dev.to/akoshel/spark-on-k8s-in-jupyterhub-1da2"&gt;https://dev.to/akoshel/spark-on-k8s-in-jupyterhub-1da2&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>cloud</category>
      <category>spark</category>
      <category>mlops</category>
    </item>
    <item>
      <title>How to run Spark on kubernetes in jupyterhub</title>
      <dc:creator>akoshel</dc:creator>
      <pubDate>Thu, 20 Oct 2022 12:01:45 +0000</pubDate>
      <link>https://dev.to/akoshel/spark-on-k8s-in-jupyterhub-1da2</link>
      <guid>https://dev.to/akoshel/spark-on-k8s-in-jupyterhub-1da2</guid>
      <description>&lt;p&gt;This is a basic tutorial on how to run Spark in client mode from jupyterhub notebook. &lt;br&gt;
All required files are presented here &lt;a href="https://github.com/akoshel/spark-k8s-jupyterhub" rel="noopener noreferrer"&gt;https://github.com/akoshel/spark-k8s-jupyterhub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F58721037%2F196504934-6b4892da-fb5a-45c2-8453-8d47c4279cbe.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fuser-images.githubusercontent.com%2F58721037%2F196504934-6b4892da-fb5a-45c2-8453-8d47c4279cbe.jpg" alt="DS_ARCH (1)"&gt;&lt;/a&gt;&lt;br&gt;
Final architecture&lt;/p&gt;

&lt;h3&gt;
  
  
  Motivation
&lt;/h3&gt;

&lt;p&gt;I found a lot of tutorials on this topic and almost all of them have custom spark and jupyterhub deployment. So I decided to minimize custom configuration and use raw open-source solutions as it is possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install minikube &amp;amp; helm
&lt;/h3&gt;

&lt;p&gt;Firstly we should create k8s infrastructure &lt;br&gt;
Minikube installation instruction &lt;a href="https://minikube.sigs.k8s.io/docs/start/" rel="noopener noreferrer"&gt;https://minikube.sigs.k8s.io/docs/start/&lt;/a&gt; &lt;br&gt;
Helm installation instruction &lt;a href="https://helm.sh/docs/intro/install/" rel="noopener noreferrer"&gt;https://helm.sh/docs/intro/install/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Make local docker images available from minikube:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

eval $(minikube docker-env)


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Install spark
&lt;/h3&gt;

&lt;p&gt;Let's install spark locally. &lt;br&gt;
Further, we will build a spark image and run the spark-pi example with spark-submit&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

sudo apt-get -y install openjdk-8-jdk-headless
wget https://downloads.apache.org/spark/spark-3.2.2/spark-3.2.2-bin-hadoop3.2.tgz
tar xvf spark-3.2.2-bin-hadoop3.2.tgz
sudo mv spark-3.2.2-bin-hadoop3.2 /opt/spark


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Build spark image
&lt;/h3&gt;

&lt;p&gt;Spark has kubernetes dockerfile. Let's build spark image&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

cat /opt/spark/kubernetes/dockerfiles/spark/Dockerfile
cd /opt/spark
docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile .


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Spark base image does not support python. So we should build pyspark image (opt/spark/spark-3.2.2-bin-hadoop3.2/kubernetes/dockerfiles/spark/bindings/python/Dockerfile). &lt;br&gt;
The basic image does not support s3a and postgres. That is why maven jars should be added. &lt;br&gt;
See modified image here &lt;a href="https://github.com/akoshel/spark-k8s-jupyterhub/blob/main/pyspark.Dockerfile" rel="noopener noreferrer"&gt;https://github.com/akoshel/spark-k8s-jupyterhub/blob/main/pyspark.Dockerfile&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Build pyspark image&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

cd /opt/spark
docker build -t pyspark:latest -f kubernetes/dockerfiles/spark/bindings/python/Dockerfile .


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Run spark-pi
&lt;/h3&gt;

&lt;p&gt;Before running examples namespace, service account, role and rolebinding should be deployed.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl apply -f spark_namespace.yaml
kubectl apply -f spark_sa.yaml
kubectl apply -f spark_sa_role.yaml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now we are ready to check the spark-pi example using spark-submit&lt;br&gt;
(Use kubectl cluster-info to find your master address)&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

/opt/spark/bin/spark-submit \
  --master k8s://https://192.168.49.2:8443 \
  --deploy-mode cluster \
  --driver-memory 1g \
  --conf spark.kubernetes.memoryOverheadFactor=0.5 \
  --name sparkpi-test1 \
  --class org.apache.spark.examples.SparkPi \
  --conf spark.kubernetes.container.image=spark:latest \
  --conf spark.kubernetes.driver.pod.name=spark-test1-pi \
  --conf spark.kubernetes.namespace=spark \
  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
  --verbose \
  local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1.jar 1000


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Check logs&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl logs -n spark spark-test1-pi | grep "Pi is roughly"
Pi is roughly 3.1416600314166003


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Great! spark is running on k8s.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install jupyterhub
&lt;/h3&gt;

&lt;p&gt;Before jupyterhub installation service account, role and rolebinding should be deployed in jupyterhub namespace&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl apply -f jupyterhub_sa.yaml
kubectl apply -f jupyterhub_sa_role.yaml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Spark executors have to be deployed in spark namespace from a notebook which is deployed in jupyterhub.&lt;br&gt;
That is why we have to deploy driver service. (driver_service.yaml)&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl apply -f driver_service.yaml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;To get access to spark UI ingress should be deployed &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl apply -f driver_ingress.yaml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Java is not installed in the default jupyterhub singleuser image.&lt;br&gt;
Build modified singleuser image.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

docker build -f singleuser.Dockerfile -t singleuser:v1 .


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;See jhub_values.yaml. There are the following modifications: new image, service account and resources.&lt;br&gt;
Now we are ready to deploy jupyterhub&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

helm upgrade --cleanup-on-fail \
--install jupyterhub jupyterhub/jupyterhub \
--namespace jupyterhub \
--create-namespace \
--version=2.0.0 \
--values jhub_values.yaml


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The easiest way to get access to jupyterhub is port-forwarding from the proxy pod. Alternatively  you can configure ingress in jhub_values.yaml&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl port-forward proxy-dd5964d5b-6lkwp  -n jupyterhub  8000:8000 # Set your pod name


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Pyspark from jupyterhub
&lt;/h3&gt;

&lt;p&gt;Open jupyterhub in your browser &lt;a href="http://localhost:8000/" rel="noopener noreferrer"&gt;http://localhost:8000/&lt;/a&gt; &lt;br&gt;
Create a jupyterhub terminal and install pyspark version that matches spark version in the image&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

pip install pyspark==3.2.2


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Create notebook&lt;br&gt;&lt;br&gt;
Create SparkContext&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

from pyspark import SparkConf, SparkContext
conf = (SparkConf().setMaster("k8s://https://192.168.49.2:8443") # Your master address name
        .set("spark.kubernetes.container.image", "pyspark:latest") # Spark image name
        .set("spark.driver.port", "2222") # Needs to match svc
        .set("spark.driver.blockManager.port", "7777")
        .set("spark.driver.host", "driver-service.jupyterhub.svc.cluster.local") # Needs to match svc
        .set("spark.driver.bindAddress", "0.0.0.0")
        .set("spark.kubernetes.namespace", "spark")
        .set("spark.kubernetes.authenticate.driver.serviceAccountName", "spark")
        .set("spark.kubernetes.authenticate.serviceAccountName", "spark")
        .set("spark.executor.instances", "2")
        .set("spark.kubernetes.container.image.pullPolicy", "IfNotPresent")
       .set("spark.app.name", "tutorial_app"))



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Run spark application&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

# Calculate the approximate sum of values in the dataset
t = sc.parallelize(range(10))
r = t.sumApprox(3)
print('Approximate sum: %s' % r)

Approximate sum: 45.0


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;See executor pods&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

kubectl get pods -n spark
NAME                                   READY   STATUS    RESTARTS   AGE
tutorial-app-d63d4c83e68ed465-exec-1   1/1     Running   0          16s
tutorial-app-d63d4c83e68ed465-exec-2   1/1     Running   0          15s


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Congratulations! Pyspark in client mode is running from jupyterhub&lt;/p&gt;

&lt;p&gt;Further steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Configure your spark config&lt;/li&gt;
&lt;li&gt;Configure jupyterhub &lt;a href="https://z2jh.jupyter.org/en/stable/jupyterhub/customization.html" rel="noopener noreferrer"&gt;https://z2jh.jupyter.org/en/stable/jupyterhub/customization.html&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install spark operator &lt;a href="https://googlecloudplatform.github.io/spark-on-k8s-operator/docs/quick-start-guide.html" rel="noopener noreferrer"&gt;https://googlecloudplatform.github.io/spark-on-k8s-operator/docs/quick-start-guide.html&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Recources:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://spark.apache.org/docs/latest/running-on-kubernetes.html" rel="noopener noreferrer"&gt;https://spark.apache.org/docs/latest/running-on-kubernetes.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://z2jh.jupyter.org/" rel="noopener noreferrer"&gt;https://z2jh.jupyter.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://scalingpythonml.com/2020/12/21/running-a-spark-jupyter-notebooks-in-client-mode-inside-of-a-kubernetes-cluster-on-arm.html" rel="noopener noreferrer"&gt;https://scalingpythonml.com/2020/12/21/running-a-spark-jupyter-notebooks-in-client-mode-inside-of-a-kubernetes-cluster-on-arm.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://oak-tree.tech/blog/spark-kubernetes-jupyter" rel="noopener noreferrer"&gt;https://oak-tree.tech/blog/spark-kubernetes-jupyter&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;P.S. My second post about spark on k8s&lt;br&gt;
Optimize spark on kubernetes&lt;br&gt;
&lt;a href="https://dev.to/akoshel/optimize-spark-on-kubernetes-32la"&gt;https://dev.to/akoshel/optimize-spark-on-kubernetes-32la&lt;/a&gt;&lt;/p&gt;

</description>
      <category>spark</category>
      <category>jupyterhub</category>
      <category>kubernetes</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
