DEV Community

Alvaro Valarezo de la Fuente
Alvaro Valarezo de la Fuente

Posted on

Deploy a ML model using Google Cloud Run, Github Actions and Terraform

In this post. I will explain how to expose an API in Python using FastAPI from a trained model, use best CI/CD practices (Github Actions) and IaC (Terraform) to automate infrastructure creation.

Prerrequisites

  • Docker Desktop
  • Git
  • Github Account
  • Google Cloud Platform with owner permissions
  • Clone this repo

Google Cloud Run

Cloud Run is a serverless platform from Google Cloud to deploy and run containers. Cloud Run can be used to serve Restful web APIs, WebSocket applications, or microservices connected by gRPC.

In this project we will need:

  1. An IAM account with permissions to create a service account
  2. Cloud Storage Admin permissions
  3. Cloud Registry Admin permissions
  4. Google Cloud Run Admin permissions

In case you don't want to expose the API for public access:

In the terraform/main.tf:

Remove the resource "google_cloud_run_service_iam_member" "run_all_users".

image

Ideally, you can set the IAM accounts that can access this api using Google Cloud Run UI or using Terraform. This approach doesn't add any latency to the customer because it uses built-in IAM roles and permissions from Google Cloud.

Terraform

Terraform is a popular open-source tool for running infrastructure as code. It uses HCL which is a declarative language to declare infrastructure.
The basic flow is:

  • Terraform init: Initializes the plugins, backend and many config files Terraform uses to keep tracking of the infrastructure.
  • Terraform plan: Generates an execution plan for all the infrastructure which is in terraform/main.tf
  • Terraform apply: Apply all the changes that were on the plan.

All steps are declared in the .github/workflows/workflow.yaml

Machine Learning Model

It's a logistic regression model that is serialized as a pickle file. It takes as an input a list with 37 parameters and returns a number between 0-1 which determines the probability of a delayed flight.

Eg:

  {
"test_array": [
0,0,0,0,0,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,1,1,0,1,0,1,0]

}
Enter fullscreen mode Exit fullscreen mode

Returns: 0
In this case it means the flight isn't going to be delayed.

Run it locally

  1. Fork the repo
  2. Clone it in your computer
  3. Run docker build -t ml-api . in the root of the project to build the image of the api.
  4. Run docker run -d --name ml -p 80:8080 ml-api to create the container using ml-api image built.
  5. Open localhost to test the project.
  6. On /predict/ post endpoint, you can use this body as an example:
  {
"test_array": [
0,0,0,0,0,1,0,1,0,0,1,0,1,0,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0,1,1,1,0,1,0,1,0]

}
Enter fullscreen mode Exit fullscreen mode
  1. You should expect a response 200 with a "prediction": 0 which means the flight wasn't delayed.

How to deploy it with your GCP account

  1. Generate a Service Account key and upload it in Github Secrets as GCLOUD_SERVICE_KEY
  2. Push any change in the main branch
  3. That's it! :)

A bit of stress testing for the API

  1. On Mac brew install wrk
  2. Run wrk -t12 -c200 -d45s -s request.lua https://mlops-api-backend-1-5gdi5qltoq-uc.a.run.app/predict/ to open 12 threads with 200 open http connections during 45 seconds.

How can we improve the results

image

The best approach would be using horizontal scaling. in this case we can create a 2nd Google Cloud Run instance and use load balancing to distribute the traffic between both instances.

Top comments (0)