Alvaro Valarezo de la Fuente

Posted on Oct 25, 2022

Deploy a ML model using Google Cloud Run, Github Actions and Terraform

#tutorial #python #programming #devops

In this post. I will explain how to expose an API in Python using FastAPI from a trained model, use best CI/CD practices (Github Actions) and IaC (Terraform) to automate infrastructure creation.

Prerrequisites

Docker Desktop
Git
Github Account
Google Cloud Platform with owner permissions
Clone this repo

Google Cloud Run

Cloud Run is a serverless platform from Google Cloud to deploy and run containers. Cloud Run can be used to serve Restful web APIs, WebSocket applications, or microservices connected by gRPC.

In this project we will need:

An IAM account with permissions to create a service account
Cloud Storage Admin permissions
Cloud Registry Admin permissions
Google Cloud Run Admin permissions

In case you don't want to expose the API for public access:

In the terraform/main.tf:

Remove the resource "google_cloud_run_service_iam_member" "run_all_users".

Ideally, you can set the IAM accounts that can access this api using Google Cloud Run UI or using Terraform. This approach doesn't add any latency to the customer because it uses built-in IAM roles and permissions from Google Cloud.

Terraform

Terraform is a popular open-source tool for running infrastructure as code. It uses HCL which is a declarative language to declare infrastructure.
The basic flow is:

Terraform init: Initializes the plugins, backend and many config files Terraform uses to keep tracking of the infrastructure.
Terraform plan: Generates an execution plan for all the infrastructure which is in terraform/main.tf
Terraform apply: Apply all the changes that were on the plan.

All steps are declared in the .github/workflows/workflow.yaml

Machine Learning Model

It's a logistic regression model that is serialized as a pickle file. It takes as an input a list with 37 parameters and returns a number between 0-1 which determines the probability of a delayed flight.

Eg:

  {
"test_array": [
0,0,0,0,0,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,1,1,0,1,0,1,0]

}

Returns: 0
In this case it means the flight isn't going to be delayed.

Run it locally

Fork the repo
Clone it in your computer
Run docker build -t ml-api . in the root of the project to build the image of the api.
Run docker run -d --name ml -p 80:8080 ml-api to create the container using ml-api image built.
Open localhost to test the project.
On /predict/ post endpoint, you can use this body as an example:

  {
"test_array": [
0,0,0,0,0,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,1,1,0,1,0,1,0]

}

You should expect a response 200 with a "prediction": 0 which means the flight wasn't delayed.

How to deploy it with your GCP account

Generate a Service Account key and upload it in Github Secrets as GCLOUD_SERVICE_KEY
Push any change in the main branch
That's it! :)

A bit of stress testing for the API

On Mac brew install wrk
Run wrk -t12 -c200 -d45s -s request.lua https://mlops-api-backend-1-5gdi5qltoq-uc.a.run.app/predict/ to open 12 threads with 200 open http connections during 45 seconds.

How can we improve the results

The best approach would be using horizontal scaling. in this case we can create a 2nd Google Cloud Run instance and use load balancing to distribute the traffic between both instances.

Amplify your impact where it matters most — building exceptional apps.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

DEV Community