<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kyle Gallatin</title>
    <description>The latest articles on DEV Community by Kyle Gallatin (@kylegallatin).</description>
    <link>https://dev.to/kylegallatin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F577116%2F3bc43dd5-5173-4fb0-a123-71ca26ece1cc.JPG</url>
      <title>DEV Community: Kyle Gallatin</title>
      <link>https://dev.to/kylegallatin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kylegallatin"/>
    <language>en</language>
    <item>
      <title>The Fastest Way to Deploy Your ML App on AWS with Zero Best Practices</title>
      <dc:creator>Kyle Gallatin</dc:creator>
      <pubDate>Sat, 20 Mar 2021 14:30:50 +0000</pubDate>
      <link>https://dev.to/aws-builders/the-fastest-way-to-deploy-your-ml-app-on-aws-with-zero-best-practices-1p8h</link>
      <guid>https://dev.to/aws-builders/the-fastest-way-to-deploy-your-ml-app-on-aws-with-zero-best-practices-1p8h</guid>
      <description>&lt;h1&gt;
  
  
  Publish a Machine Learning API to the Internet in like…15 Minutes
&lt;/h1&gt;

&lt;p&gt;You’ve been working on your ML app and a live demo is coming up fast. You wanted to push it to Github, add Docker, and refactor the code but you spent all day yesterday on some stupid pickle error you still don’t really understand. Now you only have 1 hour before the presentation and your ML app needs to be available on the internet. You just need to be able to reach your model via browser for 15 seconds to show some executives.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tuO9NDaQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/6962/0%2AIcd0CYpJwheyXAFJ" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tuO9NDaQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/6962/0%2AIcd0CYpJwheyXAFJ" alt="Photo by [Braden Collum](https://unsplash.com/@bradencollum?utm_source=medium&amp;amp;utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral)"&gt;&lt;/a&gt;&lt;em&gt;Photo by &lt;a href="https://unsplash.com/@bradencollum?utm_source=medium&amp;amp;utm_medium=referral"&gt;Braden Collum&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral"&gt;Unsplash&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This tutorial is the “zero best practices” way to create a public endpoint for your model on AWS. In my opinion, this is one of the shortest paths to creating a model endpoint assuming you don’t have any other tooling setup (SCM, Docker, SageMaker) and have written a small application in Python.&lt;/p&gt;

&lt;p&gt;You’ll need:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The AWS console&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A terminal&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Your ML app&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you don’t have an ML app and just want to follow along, here’s &lt;a href="https://github.com/kylegallatin/fast-bad-ml"&gt;the one I wrote this morning&lt;/a&gt;. My app was built with FastAPI because it was….fast…but this will work for any Flask/Dash/Django/Streamlit/whatever app.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create an EC2 Instance
&lt;/h2&gt;

&lt;p&gt;Log into the console, search “EC2” and navigate to the instances page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HKlEyOak--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4676/1%2AwzdH8TxsDzxdvjTjL8hPHQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HKlEyOak--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4676/1%2AwzdH8TxsDzxdvjTjL8hPHQ.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click “launch instance”. Now you have to select a machine type. You’re in a rush, so you just click the first eligible one you see (Amazon Linux 2 AMI).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--M28ek_Rj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5632/1%2A0L6fC4SeX85HqeAQ9fGH_Q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--M28ek_Rj--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5632/1%2A0L6fC4SeX85HqeAQ9fGH_Q.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You decide to keep the default settings (leave it a t2.micro) and click “Review and Launch”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--WTVLOrIY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5624/1%2A47H5-yg7TuEh-N0p7TTpcg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--WTVLOrIY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5624/1%2A47H5-yg7TuEh-N0p7TTpcg.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Create a new key pair, click “Download Key Pair” and then launch the instance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RmwJhqdv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2776/1%2Auxuw_l0wMCnvkhOEYQ2Oxg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RmwJhqdv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/2776/1%2Auxuw_l0wMCnvkhOEYQ2Oxg.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Port 80
&lt;/h2&gt;

&lt;p&gt;While we’re here in the console, let’s open port 80 to web traffic. Navigate back to instances page and click on your instance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--b1DyO77d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4464/1%2AdO7Sxv9wRKRAz21xZgPqkw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--b1DyO77d--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4464/1%2AdO7Sxv9wRKRAz21xZgPqkw.png" alt="I’ve already deleted this"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Go the the “Security” tab, and under security groups click there should be a link you can click on that looks like sg-randomletters (launch-wizard-3). On the next page scroll to the bottom where and go to “Edit Inbound Rules”.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--cFqA_hgi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4536/1%2AeMEsv6y6-wYs1DPJgWQODg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--cFqA_hgi--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/4536/1%2AeMEsv6y6-wYs1DPJgWQODg.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Add an all traffic rule with 0.0.0.0/0 CIDR block and then save.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--hv_y6eN4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5236/1%2Akn_19Com8iu9vks9An6vag.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--hv_y6eN4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5236/1%2Akn_19Com8iu9vks9An6vag.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Copy Your Files to the Instance
&lt;/h2&gt;

&lt;p&gt;Now our instance is ready to go, so let’s get our files over. To make it easy we can set 2 environment variables. &lt;code&gt;KEY&lt;/code&gt; is just the path to the &lt;code&gt;.pem&lt;/code&gt; file you downloaded earlier, and &lt;code&gt;HOST&lt;/code&gt; is the Public IPv4 DNS name you can view on the instances page.&lt;/p&gt;

&lt;p&gt;Edit the example below to contain your information. These commands assume macOS/Linux, for Windows check out &lt;a href="https://www.putty.org/"&gt;PuTTY&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;HOST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ec2-35-153-79-254.compute-1.amazonaws.com
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/PATH/TO/MY/KEY.pem
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can copy our files over. Again if you don’t have an ML app, clone my slapdash one, and &lt;code&gt;cd&lt;/code&gt; into the directory.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone [https://github.com/kylegallatin/fast-bad-ml.git](https://github.com/kylegallatin/fast-bad-ml.git)
cd fast-bad-ml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Now change key perms and copy everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod &lt;/span&gt;700 &lt;span class="nv"&gt;$KEY&lt;/span&gt;
scp &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nv"&gt;$KEY&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; ec2-user@&lt;span class="nv"&gt;$HOST&lt;/span&gt;:/home/ec2-user 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Type yes and you’ll see files copying over.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup Your Instance
&lt;/h2&gt;

&lt;p&gt;Now it’s time to &lt;code&gt;ssh&lt;/code&gt; in and start a session so we can run our app.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ssh -i $KEY ec2-user@$HOST
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Running &lt;code&gt;pwd &amp;amp;&amp;amp; ls&lt;/code&gt; will show you that you’re in &lt;code&gt;/home/ec2-user&lt;/code&gt; and the contents of a previous directory have been copied. Now &lt;code&gt;cd&lt;/code&gt; into that directory and setup Python (this assumes you have a requirements.txt).&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd fast-bad-ml
sudo yum install python3
sudo python3 -m pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Run and Test Your App
&lt;/h2&gt;

&lt;p&gt;Now that everything is installed, start your application on port 80 (default web traffic) using host 0.0.0.0 (binds to all addresses — 127.0.0.1 won’t work).&lt;/p&gt;

&lt;p&gt;The command below is the &lt;code&gt;uvicorn&lt;/code&gt; command for my FastAPI app, but you can replace that part to suit your app as long as host/port are the same.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo /usr/local/bin/uvicorn app:app --reload --port 80 --host 0.0.0.0 &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The &amp;amp; runs the process in the background so it doesn’t stop when you exit the session and remains available. Add &lt;code&gt;nohup&lt;/code&gt; to redirect logs for later perusal.&lt;/p&gt;

&lt;p&gt;You can now reach the app from the internet. Use the same Public IPv4 DNS as earlier and just copy it into a browser. I’ve configured the / route to return a simple message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1qSO-gRo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5628/1%2AvCGYrJzffNoIhZnrE_TFug.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1qSO-gRo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5628/1%2AvCGYrJzffNoIhZnrE_TFug.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you’ve exposed a /predict method that takes query parameters, you can pass those in with the URL too to get your prediction. The format is &lt;code&gt;$HOST/$ROUTE/?$PARAM1=X&amp;amp;$PARAM2=Y....&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--mPsLOlWB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5628/1%2AYAbL2wuPa1fS8VBrEmnmQQ.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--mPsLOlWB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://cdn-images-1.medium.com/max/5628/1%2AYAbL2wuPa1fS8VBrEmnmQQ.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Just want to caveat this is nothing close to production. Even if we introduced Docker and the scale of Kubernetes, true “production” requires tests, automated CI/CD, monitoring, and much more. But for getting you to the demo on time? There’s nothing better. Good luck!&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>aws</category>
      <category>ec2</category>
      <category>fastapi</category>
    </item>
    <item>
      <title>Exposing Tensorflow Serving’s gRPC Endpoints on Amazon EKS</title>
      <dc:creator>Kyle Gallatin</dc:creator>
      <pubDate>Wed, 10 Feb 2021 15:13:45 +0000</pubDate>
      <link>https://dev.to/aws-builders/exposing-tensorflow-serving-s-grpc-endpoints-on-amazon-eks-57aa</link>
      <guid>https://dev.to/aws-builders/exposing-tensorflow-serving-s-grpc-endpoints-on-amazon-eks-57aa</guid>
      <description>&lt;p&gt;&lt;a href="https://www.tensorflow.org/tfx/guide/serving" rel="noopener noreferrer"&gt;Tensorflow serving&lt;/a&gt; is popular way to package and deploy models trained in the tensorflow framework for real time inference. Using the official docker image and a trained model, you can almost instantaneously spin up a container exposing REST and gRPC endpoints to make predictions.&lt;/p&gt;

&lt;p&gt;Most of the examples and documentation on tensorflow serving focus on the popular REST endpoint usage. Very few focus on how to adapt and use the gRPC endpoint for their own use case — and fewer mention how this works when you scale to kubernetes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F6016%2F0%2AxFKkSh9pPYYyeIrg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F6016%2F0%2AxFKkSh9pPYYyeIrg" alt="Rare footage of a real live docker whale — Photo by [Todd Cravens](https://unsplash.com/@toddcravens?utm_source=medium&amp;amp;utm_medium=referral) on [Unsplash](https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral)"&gt;&lt;/a&gt;&lt;em&gt;Rare footage of a real live docker whale — Photo by &lt;a href="https://unsplash.com/@toddcravens?utm_source=medium&amp;amp;utm_medium=referral" rel="noopener noreferrer"&gt;Todd Cravens&lt;/a&gt; on &lt;a href="https://unsplash.com?utm_source=medium&amp;amp;utm_medium=referral" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this post I’ll scratch the surface of gRPC, kubernetes/EKS, nginx, and how we can use them with tensorflow serving.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why gRPC over REST?
&lt;/h2&gt;

&lt;p&gt;There’s a ton of reasons. First off gRPC uses efficient HTTP 2 protocol as opposed to classic HTTP 1. It also uses language neutral, serialized protocol buffers instead of JSON which reduces the overhead of serializing and deserializing large JSON payloads.&lt;/p&gt;

&lt;p&gt;Some talk about the benefits of the &lt;a href="https://cloud.google.com/blog/products/api-management/understanding-grpc-openapi-and-rest-and-when-to-use-them" rel="noopener noreferrer"&gt;API definition and design&lt;/a&gt;, and others about the &lt;a href="https://stackoverflow.com/questions/44877606/is-grpchttp-2-faster-than-rest-with-http-2" rel="noopener noreferrer"&gt;efficiencies of HTTP 2 in general&lt;/a&gt; — but here’s my experience with gRPC in regards machine learning deployments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;gRPC is crazy efficient. It can dramatically reduce inference time and the overhead of large JSON payloads by using protobufs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;gRPC in production has a pretty big learning curve and was a massive pain in the a** to figure out&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short gRPC can offer huge performance benefits. But as a more casual API developer, I can definitely say no — it is not “easier” than REST. For someone new to gRPC, HTTP 2 with nginx, TLS/SSL and tensorflow serving — there was &lt;em&gt;a lot&lt;/em&gt; to figure out.&lt;/p&gt;

&lt;p&gt;Still the benefits are worth it. I saw up to an 80% reduction in inference time with large batch sizes doing initial load tests. For groups with strict service level agreements (SLAs) around inference time, gRPC can be a life saver. While I won’t explain gRPC in more depth here, the internet is littered with &lt;a href="https://www.semantics3.com/blog/a-simplified-guide-to-grpc-in-python-6c4e25f0c506/#:~:text=Google's%20gRPC%20provides%20a%20framework,over%20conventional%20REST%2BJSON%20APIs." rel="noopener noreferrer"&gt;helpful tutorials&lt;/a&gt; to get you started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setup a Kubernetes Cluster on AWS
&lt;/h2&gt;

&lt;p&gt;Let’s get started and setup a kubernetes cluster. We’ll use AWS EKS, and the eksctl command line utility. For the most part , you can also follow along on kubernetes in Docker Desktop if you don’t have an AWS account.&lt;/p&gt;

&lt;p&gt;If you don’t have eksctl or aws-cli installed already you can use the &lt;a href="https://github.com/kylegallatin/eks-demos" rel="noopener noreferrer"&gt;docker image in this repository&lt;/a&gt;. It also comes with kubectl for interacting with our cluster. First, build and run the container.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker build -t eksctl-cli .
docker run -it eksctl-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Once you started a bash session in the new container, sign into AWS with your preferred user and AWS access key ids:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aws configure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;You’ll need a key, secret key, default region name (I use us-east-1) and output format like json. Now we can check on the status of our clusters.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl get clusters
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;If you have no active clusters like myself, you’ll get No clusters found. In that case, let’s create one.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl create cluster \
--name test \
--version 1.18 \
--region us-east-1 \
--nodegroup-name linux-nodes \
--nodes 1 \
--nodes-min 1 \
--nodes-max 2 \
--with-oidc \
--managed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;You can vary the parameters if you like, but as this is just an example I’ve left the cluster size small. This may take a little while. If you go to the console you’ll see your cluster creating.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F4332%2F1%2AjuuQvi5KN3TL8IiO6Jkm4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F4332%2F1%2AjuuQvi5KN3TL8IiO6Jkm4w.png" alt="Make sure you’re logged in with the proper user and looking at the right region"&gt;&lt;/a&gt;&lt;em&gt;Make sure you’re logged in with the proper user and looking at the right region&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now that our cluster is complete, we can install nginx for ingress and load balancing. Let’s test kubectl is working as expected. Here’re some example commands.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl config get-contexts
kubectl get nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;You should see the current cluster context as something like &lt;a href="mailto:eksctl@test.us-east-1.eksctl.io"&gt;eksctl@test.us-east-1.eksctl.io&lt;/a&gt; and also see the node(s) created by our command. You can view similar content in the console if you’re signed in with proper permissions (i.e. the user you created the cluster with).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F4340%2F1%2A_PRILrg8MSSsdofynuJTTw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F4340%2F1%2A_PRILrg8MSSsdofynuJTTw.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cluster’s all good, so let’s install nginx. Going back to the command line in our docker container:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.43.0/deploy/static/provider/aws/deploy.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;If you’re using a kubernetes provider other than AWS, check the &lt;a href="https://kubernetes.github.io/ingress-nginx/deploy/#aws" rel="noopener noreferrer"&gt;nginx installation instructions&lt;/a&gt; (there is one for Docker Desktop k8s). This command will create all the necessary resources for nginx. You can check it is up and running by looking at the pods in the new namespace.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get po -n ingress-nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;What we’re most interested in is whether we’ve created a LoadBalancer for our cluster with an externally reachable IP address. If you run this command:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get svc -n ingress-nginx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;You should see that nginx-ingress-controller has an external IP like something.elb.us-east-1.amazonaws.com. If you copy and paste that into a browser, you should see this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2676%2F1%2AtPhUqReHfZXvXvfizeHMOw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2676%2F1%2AtPhUqReHfZXvXvfizeHMOw.png" alt="Never been so happy to get a 404"&gt;&lt;/a&gt;&lt;em&gt;Never been so happy to get a 404&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Hell yeah! I know it doesn’t look good but what we’ve actually done is create proper ingress into our cluster and we can expose things to the ~web~.&lt;/p&gt;
&lt;h2&gt;
  
  
  Deploy Tensorflow Serving on Kubernetes
&lt;/h2&gt;

&lt;p&gt;Tensorflow has some &lt;a href="https://www.tensorflow.org/tfx/serving/serving_kubernetes" rel="noopener noreferrer"&gt;passable docs&lt;/a&gt; on deploying models to kubernetes, and it’s easy to create your own image for serving a custom model. For ease, I’ve pushed the classic half_plus_two model to &lt;a href="https://hub.docker.com/repository/docker/kylegallatin/tfserving-example" rel="noopener noreferrer"&gt;dockerhub&lt;/a&gt; so that anyone can pull it for this demo.&lt;/p&gt;

&lt;p&gt;The following YAML defines the deployment and service for a simple k8s app that exposes the tfserving grpc endpoint.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;To deploy it in our cluster, simply apply the raw YAML from the command line.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f [https://gist.githubusercontent.com/kylegallatin/734176736b0358c7dfe57b8e62591931/raw/ffebc1be625709b9912c3a5713698b80dc7925df/tfserving-deployment-svc.yaml](https://gist.githubusercontent.com/kylegallatin/734176736b0358c7dfe57b8e62591931/raw/ffebc1be625709b9912c3a5713698b80dc7925df/tfserving-deployment-svc.yaml)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Another kubectl get po will show us if our pod has created successfully. To make sure the servers started, check the logs. You should see things like Running gRPC ModelServer at 0.0.0.0:8500 and Exporting HTTP/REST API. The gRPC one is the only one we’ve made available via our pod/service.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl logs $POD_NAME 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Before exposing through nginx, let’s make sure the service works. Forward the tensorflow serving service to your localhost.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward service/tfserving-service 8500:8500 &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Then there’re multiple ways we can check the service is running. The simplest is just establishing an insecure connection with the grpc Python client.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;If you enter the Python shell in the docker container and run the above, you should be able to connect to the grpc service insecurely.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;&amp;gt;&amp;gt; grpc_server_on(channel)
Handling connection for 8500
True
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;This means the service is working as expected. Let’s take a look at the more tensorflow specific methods we can call. There should be a file called get_model_metadata.py in your docker container (if following along elsewhere &lt;a href="https://github.com/kylegallatin/eks-demos/blob/main/get_model_metadata.py" rel="noopener noreferrer"&gt;here’s the link&lt;/a&gt;). Let’s run that and inspect the output.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python get_model_metadata.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;Whoa, lot’s of unwieldy and inconvenient information. The most informative part is the {'inputs': 'x'... portion. This helps us formulate the proper prediction request. Note — what we’re actually doing in these Python scripts is using tensorflow provided libraries to generate prediction protobufs and send them over our insecure gRPC channel.&lt;/p&gt;

&lt;p&gt;Let’s use this information to make a prediction over gRPC. You should also have a get_model_prediction.py file in your current directory.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Run that and inspect the output. You’ll see a json-like response (it’s actually a tensorflow object).&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;outputs {
  key: "y"
  value {
    dtype: DT_FLOAT
    tensor_shape {
      dim {
        size: 3
      }
    }
    float_val: 2.5
    float_val: 3.0
    float_val: 4.5
  }
}
model_spec {
  name: "model"
  version {
    value: 1
  }
  signature_name: "serving_default"
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;We’ve made our first predictions over gRPC! Amazing. All of this is pretty poorly documented on the tensorflow serving side in my opinion, which can make it difficult to get started.&lt;/p&gt;

&lt;p&gt;In the next section, we’ll top it off by actually exposing our gRPC service via nginx and our public URL.&lt;/p&gt;
&lt;h2&gt;
  
  
  Exposing gRPC Services via Nginx
&lt;/h2&gt;

&lt;p&gt;The key thing that’ll be different about going the nginx route is that we can no longer establish insecure connections. We will be required by nginx to provide TLS encryption for our domain over port 443 in order to reach our service.&lt;/p&gt;

&lt;p&gt;Since nginx enables http2 protocol on port 443 by default, we shouldn’t have to make any changes there. However, if you have existing REST services on port 80, you may want to disable ssl redirects in the nginx configuration.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl edit configmap -n ingress-nginx ingress-nginx-controller
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Then add:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;data:
  "ssl-redirect": "false"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;And save.&lt;/p&gt;
&lt;h3&gt;
  
  
  Create a TLS Secret
&lt;/h3&gt;

&lt;p&gt;To create TLS for your domain, you can do something &lt;a href="http://www.inanzzz.com/index.php/post/jo4y/using-tls-ssl-certificates-for-grpc-client-and-server-communications-in-golang-updated" rel="noopener noreferrer"&gt;akin to this&lt;/a&gt;. Proper TLS/SSL involves a certificate authority (CA), but that’s out of scope for this article.&lt;/p&gt;

&lt;p&gt;First create a cert directory and then create a conf file.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Edit&lt;/strong&gt; DNS.1 &lt;strong&gt;so that it reflects the actual hostname of your ELB&lt;/strong&gt; (the URL we used earlier to see nginx in browser). You don’t need to edit CN.&lt;/p&gt;

&lt;p&gt;Using this, create a key and cert.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;openssl genrsa -out cert/server.key 2048
openssl req -nodes -new -x509 -sha256 -days 1825 -config cert/cert.conf -extensions 'req_ext' -key cert/server.key -out cert/server.crt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Then use those to create a new kubernetes secret in the default namespace. We will use this secret in our ingress object.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CERT_NAME=tls-secret
KEY_FILE=cert/server.key
CERT_FILE=cert/server.crt
kubectl create secret tls ${CERT_NAME} --key ${KEY_FILE} --cert ${CERT_FILE}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;Finally we create the ingress object with necessary annotations for grpc. Note that we specify the tls-secret in the ingress. We’re also doing path rewrite and exposing our service on /service1. By segmenting our services by path, we can expose more than 1 gRPC service via nginx.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;Replace — host: with your URL. You can apply the yaml above and then edit the resulting object, or vice versa. This way is pretty easy:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f [https://gist.githubusercontent.com/kylegallatin/75523d2d2ce2c463c653e791726b2ba1/raw/4dc91989d8bdfbc87ca8b5192f60c9c066801235/tfserving-ingress.yaml](https://gist.githubusercontent.com/kylegallatin/75523d2d2ce2c463c653e791726b2ba1/raw/4dc91989d8bdfbc87ca8b5192f60c9c066801235/tfserving-ingress.yaml)
kubectl edit tfserving-ingress
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;You now have an ingress in the default namespace. We can’t connect like before, with an insecure connection, as this endpoint specifies TLS. We have to establish a secure connection using the certificate we just created.&lt;/p&gt;

&lt;p&gt;The nice part is, if we have this cert we can now connect from anywhere — we no longer need kubectl to port forward the service to our container or local machine. It’s publicly accessible.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;



&lt;p&gt;Replace the crt_path with the cert you’ve generated and replace the host with your URL.&lt;/p&gt;

&lt;p&gt;Note the custom route we’ve exposed our service on. Tensorflow serving services are available on /tensorflow.serving.PredictionService by default, but this makes it difficult to add new services if we’re exposing them all on the same domain.&lt;/p&gt;

&lt;p&gt;gRPC only connects to a host and port — but we can use whatever service route we want. Above I use the path we configured in our k8s ingress object: /service1, and &lt;a href="https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_service_pb2_grpc.py" rel="noopener noreferrer"&gt;overwrite the base configuration provided by tensorflow serving&lt;/a&gt;. When we call the tfserving_metadata function above, we specify /service1 as an argument.&lt;/p&gt;

&lt;p&gt;This also applies to making predictions, we can easily establish a secure channel and make predictions to our service over it using the right host, route and cert.&lt;/p&gt;


&lt;div class="ltag_gist-liquid-tag"&gt;
  
&lt;/div&gt;


&lt;p&gt;Again, we overwrite the route with our custom path, convert our data into a tensor proto and make a prediction! Easy as hell…haha not really. You won’t find any of this in the tensorflow docs (or at least I didn’t), and the addition of unfamiliar tools like TLS/gRPC/HTTP 2 with nginx make it even more difficult.&lt;/p&gt;

&lt;p&gt;When you’re done delete your cluster so you don’t get charged.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eksctl delete cluster test&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Ay We’re Done&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;Hope you find this helpful, it was definitely difficult to piece together from various articles, source code, stack overflow questions and github issues.&lt;/p&gt;

&lt;p&gt;Probably worth noting we didn’t do any actual CI/CD here, and I wouldn’t recommend kubectl edit in production — but those are topics for other articles. If I used any terminology improperly, or there’s something you think could be explained better please reach out! Happy to connect on &lt;a href="https://www.linkedin.com/in/kylegallatin/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://twitter.com/kylegallatin" rel="noopener noreferrer"&gt;twitter&lt;/a&gt; or just go at it in the Medium comments section.&lt;/p&gt;

&lt;p&gt;✌🏼&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>kubernetes</category>
      <category>aws</category>
      <category>grpc</category>
    </item>
  </channel>
</rss>
