How can you deploy a machine learning model without provisioning ANY servers?
UPDATE: This guide is also referenced in the fast.ai course v3 (Part 1). The only difference is the fast.ai course version doesn’t mention and use the Recommendations (pipenv & pyenv) thus making it a bit less opinionated. Otherwise the two guides are identical.
Table of Contents
- FaaS - Function as a Service (aka serverless)
- Microsoft Azure Functions
- Requirements
- Recommendations
- 1 - Local Setup
- 2 - Docker Setup
- 3 - Azure Setup
- Conclusion
- References:
In the previous article, we’ve looked at different ways to deploy a trained machine learning model for a mobile app. This includes implementing the inference process on the mobile device and on different cloud-based architectures (IaaS, VPS, PaaS, ML PaaS).
In this article, I will explore the serverless architecture, the newest kid on the block, and see what are its characteristics, who the major service providers are, and implement a simple image classifier in fastai/PyTorch using one of the providers.
FaaS - Function as a Service (aka serverless)
This category of server implementation brings PaaS to a whole new level.
You write your code, which is called a function, it can access resources you set up in the cloud provider, such as online storage for the photos. Then you set up events that trigger the function to run.
There’re four main advantages of going serverless:
- no need to provision or manage any hardware
- no need to pay for any idle resource time
- infrastructure can scale automatically depending on load
- availability and fault tolerance of the servers are built in
This is an attractive list of qualities. Sounds like everyone should be going serverless.
But should you?
In reality, there are hidden costs in both dollar and time that you should be aware of. Though these costs only become problematic if your app is being heavily utilized (I mean like millions of times a month). First world problems.
What is relevant for your app is the limitations imposed by the cloud provider that can make your deployment problematic. Some of the main ones are:
-
Supported Languages
- As the responsibility of setting up and maintaining the software framework for running the code falls to the cloud provider, they have to make most of their resources and only support the most popular languages and frameworks. Your choices will be limited, down to the version number.
-
Storage And Memory Limitations
- You are usually limited to the amount of disk space and memory that your code has access to. This is especially a problem for ML applications because:
- the application usually has a long list of dependencies and their dependencies (besides ML framework such as scikit-learn, PyTorch, TensorFlow, they also have dependencies such as numpy, pandas, etc.)
- the model file that contains the pre-trained weights can be big
-
Time Limitation
- Each function is allowed a certain amount of time to run (usually 5-10 mins) before it is forced to terminate.
Serverless is still a new approach to the cloud, and both companies and developers are beginning to embrace it. However, there is already a lot of service providers to choose from. We can define 2 categories of serverless service providers:
- own hardware and provides API for access
- do not own any hardware but provide its own API to access the previous category’s hardware
Here we have a list of the major providers from the first category:
Python Runtime Version | Deployment Package Size | Memory | Timeout | |
---|---|---|---|---|
AWS Lambda | 2.7, 3.6, 3.7 | 50 MB (compressed) | ||
250 MB (uncompressed) | 3 GB | 900 sec | ||
Google Cloud Functions | 3.7.1 (beta) | Source: 100 MB (compressed) | ||
Source + Modules: 500 MB (uncompressed) | 2 GB | 540 sec | ||
IBM OpenWhisk | 2.7.15, 3.6.8, 3.7.2 | 48 MB | 2 GB | 600 sec |
Microsoft Azure Functions | 3.6 (preview) | ? | 1.5 GB | 600 sec (Consumption Plan) |
Unlimited (App Service Plan) |
For a simple image classification app, the function shouldn’t have any problem staying within the memory limits and the timeout limits. What might be a problem is the size of the deployment package. Basically, the upload of the deployment package directly to the serverless architecture will probably fail as it is likely to be bigger than the limits.
A workaround to the disk space limitation is to stripdown the ML libraries such that you are only left with what’s absolutely needed. Additionally, you can also separate the libraries into submodules such that each module can be fitted into its own cloud function. Making an inference function call would trigger a chain of cloud functions, with the last function returning the prediction result to you.
While these methods work, it seems to me that it introduces another problem. Because the slimming down of the ML libraries isn’t officially supported, there will be some work that needs to be done in order to upgrade the library to the latest version. Given the fast-paced development of all the ML framework today, this might not be a very sustainable solution.
There is an interesting method (and probably the proper method) of using AWS Lambda Layers with AWS Lambda to bypass the storage limits. AWS Lambda Layers allows you to organize and store dependency libraries in the form of ZIP archives in AWS. These archives can be called from a Lambda function as needed and thus keep the Lambda function deployment package to a minimal, avoiding the 250 MB (uncompressed) size limit. Layers can be made public and shared. And there is a layer which contains PyTorch v1 running on Python 3.6 that the aforementioned method uses.
Note from the above table that Microsoft Azure Functions doesn’t state any limits.
Let’s continue and look at the second category of serverless providers:
Provider | Remarks |
---|---|
Zappa | Python-only API wrapper for AWS Lambda |
Zeit | API wrapper around AWS Lambda |
Kubeless | Kubernetes-centric API wrapper that supports a number of serverless providers |
Serverless Framework | API wrapper that supports most major serverless providers |
These providers offer an API wrapper that attempts to make the serverless experience friendlier and add other values to the user where they see fit. Providers like Serverless Framework and Kubeless supports multiple serverless infrastructure providers (our first category providers). This makes them especially useful because you can use one API to deploy to any of their supported providers and helps with mitigating the problem of provider lock-ins.
Out of these providers, Serverless Framework seems the most interesting because its free API wrapper supports the most serverless infrastructure providers and in a number of languages. It has a large community that has written many plugins which add extra functionality to the core API wrapper.
Let’s start to use Serverless and deploy to Amazon Lambda (without using Layers) and Google Cloud Compute and see what problems we might encounter:
fastai Doesn’t Compile In Windows WSL Ubuntu Using pip (unexpected)
All the serverless architectures require the use of pip and requirements.txt to install dependencies, thus I couldn’t use conda to install fastai. This led to a lot compiling issues which didn’t come up when I use conda. I found this somewhat surprising as I’ve never encountered differences between Ubuntu on WSL and straight up Ubuntu. This makes sense in hindsight as I only used WSL Ubuntu for ruby or node development. Whereas I always used conda under Windows for Python developments, which is the time when I use libraries that have more compiling complications which conda help solve.
Storage Limit From Amazon Lambda and Google Cloud Functions Is Too Small (expected)
Once the compilation problems went away after I started deploying on a real Ubuntu machine, I was hitting the storage limits.
Unfortunately, the trained model file for MNIST is already 80 MB. When you add fastai, PyTorch, and their dependencies, there’s no way everything can fit in GCF, or in AWS Lambda even if you compressed the libraries and remove unnecessary files by enabling slim package
.
Microsoft Azure Functions
Let’s turn our attention to Azure. This wasn’t the first choice because the Serverless documentation lacked Python implementation examples and their Azure plugin hasn’t got any updates for a while. This is in stark contrast when compared with the official Azure documentation which is detailed with good support for Python. Perhaps Azure has been moving along quickly and there isn’t enough time for the wrapper APIs to catch up yet.
In order to try Azure, we will need to forego the Serverless Framework (and all the benefits that a wrapper API provides) and directly use the Azure. It’s worth a try.
Pricing
Microsoft Azure Functions offers two kinds of pricing, Consumption plan and App Service plan. The main difference is that the Consumption plan allows you to pay only when your function runs. It will scale the architecture for you if needed but you don’t have any control over how it scales. See here for the Consumption plan pricing.
With the App Service plan, you can pick the level of computing resources that you want your function to run on. You are then charged for as long as your resources are defined, regardless of whether your function is running or not. See here for the App Service plan pricing.
Currently, python is still in preview stage in Azure Functions and fastai only works when you provide your own custom Docker image on the App Service plan.
Requirements
Software
- real Linux (Windows WSL Ubuntu isn’t sufficient. Below is using Ubuntu 18.04)
- Docker (to compile fastai dependencies that don’t support manylinux-compatible wheels from PyPI e.g. Bottleneck)
- Python 3.6 (the only Python runtime currently supported by Azure Functions)
- Azure Functions Core Tools version 2.x
- Azure CLI
Accounts
Recommendations
- pipenv (Azure Function require virtualenv, so might as well use pipenv which uses virtualenv underneath)
- pyenv (in case you use a Python version other than 3.6. Besides, pyenv is natively supported by pipenv)
1 - Local Setup
Setup Project Directory
Replace <PROJECT_DIR>
with your own project directory name.
mkdir <PROJECT_DIR>
cd <PROJECT_DIR>
pipenv --python 3.6
pipenv shell
Create Azure Functions project
Create an Azure Function Project that uses the Python runtime. This will generate several files in the <PROJECT_DIR>
.
func init --docker
When prompted, select python:
-
Select a worker runtime:
python
Create Azure Function
Create a function with name <FUNCTION_NAME>
using the template HttpTrigger
. Replace <FUNCTION_NAME>
with your own function name.
func new --name <FUNCTION_NAME> --template "HttpTrigger"
Install fastai & Dependencies
Add Azure’s dependencies to Pipfile.
pipenv install -r requirements.txt
Install fastai and any other dependencies your app needs in the virtual environment.
pipenv install fastai
Then output all the dependencies to requirements.txt which will be used when you build the Docker image.
pipenv lock -r > requirements.txt
Update Function
Modify the following files in the directory:
/<FUNCTION_NAME>/ __init__.py
This is where your inference function lives. The following is an example of using a trained image classification model.
import logging
import os
import azure.functions as func
from fastai.vision import *
import requests
def main(req: func.HttpRequest) -> func.HttpResponse:
path = Path.cwd()
learn = load_learner(path)
request_json = req.get_json()
r = requests.get(request_json['url'])
if r.status_code == 200:
temp_image_name = "temp.jpg"
with open(temp_image_name, 'wb') as f:
f.write(r.content)
else:
return func.HttpResponse(f"Image download failed, url: {request_json['url']}")
img = open_image(temp_image_name)
pred_class, pred_idx, outputs = learn.predict(img)
return func.HttpResponse(f"request_json['url']: {request_json['url']}, pred_class: {pred_class}")
/<FUNCTION_NAME>/function.json
Update the function authorization so that it can be called without any additional security key. Replace the corresponding line in the file with the following:
...
"authLevel": "anonymous",
...
export.pkl
Copy your trained model file export.pkl
to <PROJECT_DIR>
.
Test Function
Run the following command to start the function on your local machine:
func host start
This will give you an output with the URL for testing:
Now listening on: http://0.0.0.0:7071
Application started. Press Ctrl+C to shut down.
Http Functions:
inference_function: [GET,POST] http://localhost:7071/api/<FUNCTION_NAME>
Check Test Outputs
To check that your function is running properly, visit http://localhost:7071 and you should see the following:
You can send an HTTP POST method to http://localhost:7071/api/<FUNCTION_NAME>
to check that your inference function is working. Replace <URL_TO_IMAGE>
with a URL that points to an image for inferencing.
POST http://localhost:7071/api/<FUNCTION_NAME> HTTP/1.1
content-type: application/json
{
"url": "<URL_TO_IMAGE>"
}
You should then see a HTTP response:
HTTP/1.1 200 OK
Connection: close
Date: Sun, 17 Mar 2019 06:30:29 GMT
Content-Type: text/plain; charset=utf-8
Server: Kestrel
Content-Length: 216
request_json['url']: <URL_TO_IMAGE>, pred_class: <PREDICTED_CLASS>
You should see the class that your inference function predicts in <PREDICTED_CLASS>
.
You can press Ctrl+C
to stop the testing when you’re ready.
2 - Docker Setup
Build Docker image
You can now build the Docker image that will contain your app and all the python libraries that it needs to run.
docker build --tag <DOCKER_HUB_ID>/<DOCKER_IMAGE_NAME>:<TAG> .
Test Docker image
Start the Docker image on your local machine for testing.
docker run -p 8080:80 -it <DOCKER_HUB_ID>/<DOCKER_IMAGE_NAME>:<TAG>
Your app in the Docker image is now running at the localhost:8080
. You can run the same tests in Check Test Outputs with the new URL and you should see the same test output as before.
You can press Ctrl+C
to stop the testing when you’re ready.
Push Docker image to Docker Hub
Log in to Docker from the command prompt. Enter your Docker Hub password when prompted.
docker login --username <DOCKER_HUB_ID>
You can now push the Docker image created earlier to Docker Hub.
docker push <DOCKER_HUB_ID>/<DOCKER_IMAGE_NAME>:<TAG>
3 - Azure Setup
Setup Azure Resources
Login to Microsoft Azure with Azure CLI if you haven’t already.
az login
Execute the following commands to create Azure resources and run the inference app on Azure Functions.
The following example uses the lowest pricing tier, B1.
Replace the following placeholders with your own names:
-
<RESOURCE_GROUP>
- name of the Resource Group that all other Azure Resources created for this app will fall under
- e.g.
ResourceGroup
-
<LOCATION_ID>
- run the following command to see the list of available locations:
az appservice list-locations --sku B1 --linux-workers-enabled
- e.g.
centralus
-
<STORAGE_ACCOUNT>
- name of the Azure Storage Account, which is a general-purpose account to maintain information about your function
- must be between 3 and 24 characters in length and may contain numbers and lowercase letters only
- e.g.
inferencestorage
-
<FUNCTION_APP>
- name of the Azure Function App that you will be creating
- will be the default DNS domain and must be unique across all apps in Azure
- e.g.
inferenceapp123
Create Resource Group
az group create \
--name <RESOURCE_GROUP> \
--location <LOCATION_ID>
Create Storage Account
az storage account create \
--name <STORAGE_ACCOUNT> \
--location <LOCATION_ID> \
--resource-group <RESOURCE_GROUP> \
--sku Standard_LRS
Create a Linux App Service Plan
az appservice plan create \
--name <LOCATION_ID> \
--resource-group <RESOURCE_GROUP> \
--sku B1 \
--is-linux
Create the App & Deploy the Docker image from Docker Hub
az functionapp create \
--resource-group <RESOURCE_GROUP> \
--name <FUNCTION_APP> \
--storage-account <STORAGE_ACCOUNT> \
--plan <LOCATION_ID> \
--deployment-container-image-name <DOCKER_HUB_ID>/<DOCKER_IMAGE_NAME>:<TAG>
Configure the function app
The following assumes the Docker image uploaded earlier in your Docker Hub profile is public. If you have set it to private, you can see here to add your Docker credentials so that Azure can access the image.
storageConnectionString=$(az storage account show-connection-string \
--resource-group <RESOURCE_GROUP> \
--name <STORAGE_ACCOUNT> \
--query connectionString --output tsv)
az functionapp config appsettings set --name <FUNCTION_APP> \
--resource-group <RESOURCE_GROUP> \
--settings AzureWebJobsDashboard=$storageConnectionString \
AzureWebJobsStorage=$storageConnectionString
Run your Azure Function
After the previous command, it will generally take 15-20 minutes for the app to deploy on Azure. You can also see your app in the Microsoft Azure Portal under Function Apps.
The URL for your app will be:
https://<FUNCTION_APP>.azurewebsites.net/api/<FUNCTION_NAME>
You can run the same tests in Check Test Outputs with the new URL and you should see the same output as before.
Delete Resource Group
When you are done, delete the Resource Group.
az group delete \
--name <RESOURCE_GROUP> \
--yes
Remember that with the App Service plan, you are being charged for as long as you have resources running, even if you are not calling the function. So it is best to delete the resource group when you are not calling the function to avoid unexpected charges.
Conclusion
Microsoft Azure Function seems to be the simplest to deploy, without the need for any manual tinkering on our inference code or on the ML library. Unfortunately, their App Service pricing plan works like other non-serverless pricing and foregoes one of the major advantages of a serverless architecture, paying for resources only when the function runs. But as the serverless solutions mature, it won’t be long before we can run PyTorch functions like how we’re promised.
Feel free to let me know in the comments if you know of other ways to deploy fastai/PyTorch on a serverless architecture.
References:
Create your first Python function in Azure (preview)
Create a function on Linux using a custom image
Azure Functions Python developer guide
Top comments (0)