DEV Community

Cover image for Bootstrapping Label Studio on Google Cloud with Terraform
Marjori Pomarole
Marjori Pomarole

Posted on

Bootstrapping Label Studio on Google Cloud with Terraform

Say you need to quickly label tons of data for an ML learning pipeline. You have people on your team or university lab ready to start labeling, but you are struggling to quickly put a tool up and running for them to label concurrently.

Label Studio is a great tool for this. In this post, my goal is to show you how to deploy label-studio as well as show how to create terraform files to quickly bootstrap a Django application running on Google Run connected to a Postgres instance on Google SQL.

The code can be found in this pull request.

  1. First login to your google cloud account.
gcloud auth application-default login
Enter fullscreen mode Exit fullscreen mode
  1. Build the docker image and push it to Google Build with this cloudbuild.yaml script:
steps:
  - id: "build image"
    name: "gcr.io/cloud-builders/docker"
    args:
      [
        "build",
        "-t",
        "gcr.io/${PROJECT_ID}/${_SERVICE_NAME}",
        "${_SERVICE_FOLDER}",
      ]

  - id: "push image"
    name: "gcr.io/cloud-builders/docker"
    args: ["push", "gcr.io/${PROJECT_ID}/${_SERVICE_NAME}"]

substitutions:
  _SERVICE_FOLDER: .
  _SERVICE_NAME: label-studio

images:
  - "gcr.io/${PROJECT_ID}/${_SERVICE_NAME}"
Enter fullscreen mode Exit fullscreen mode

I used direnv to setup the PROJECT_ID env variable to be the Google Project created.

  1. Set up the terraform files to bootstrap the project. Below are the terraform I created to get this all up, but you will be needing to run the following commands to let the resources be created:
terraform init
terraform apply -var="database_password=<DB_PASSWORD>" \
                -var="project=<PROJECT_ID>" \
                -var="database_user=<DB_USER>" \
                -var="domain_name=<DOMAIN_NAME>"
Enter fullscreen mode Exit fullscreen mode

After about 10-15 min, all the resources will be created, and you will have Label Studio up and running to create users and start labeling!

main.tf

terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 3.80"
    }
  }
}

provider "google" {
  project = var.project
}

locals {
  service_folder = "service"
  service_name   = "label-studio"

  deployment_name        = "label-studio"
  label_studio_worker_sa = "serviceAccount:${google_service_account.label_studio_worker.email}"
}
Enter fullscreen mode Exit fullscreen mode

project.tf

Creating the service account that will be used to run the application as well as enable the APIs and services that will be used.

# Enable services
resource "google_project_service" "run" {
  service            = "run.googleapis.com"
  disable_on_destroy = false
}

resource "google_project_service" "iam" {
  service            = "iam.googleapis.com"
  disable_on_destroy = false
}

resource "google_project_service" "cloudbuild" {
  service            = "cloudbuild.googleapis.com"
  disable_on_destroy = false
}

# Create a service account
resource "google_service_account" "label_studio_worker" {
  account_id   = "label-studio-worker"
  display_name = "Label Studio SA"
}

# Set permissions
resource "google_project_iam_binding" "service_permissions" {
  for_each = toset([
    "logging.logWriter", "cloudsql.client"
  ])

  role       = "roles/${each.key}"
  members    = [local.label_studio_worker_sa]
  depends_on = [google_service_account.label_studio_worker]
}
Enter fullscreen mode Exit fullscreen mode

database.tf

# The Cloud SQL postgres
resource "google_sql_database_instance" "label-studio-postgres" {
  name             = "label-studio-sql"
  database_version = "POSTGRES_13"
  region           = var.region

  settings {
    tier = var.database_tier
  }
}

resource "google_sql_user" "users" {
  name     = var.database_user
  instance = google_sql_database_instance.label-studio-postgres.name
  password = var.database_password
}
Enter fullscreen mode Exit fullscreen mode

service.tf

Run the service, setup IAM policy and create the domain mapping.

# The Cloud Run service
resource "google_cloud_run_service" "label-studio" {
  name                       = local.service_name
  location                   = var.region
  autogenerate_revision_name = true

  template {
    spec {
      service_account_name = google_service_account.label_studio_worker.email
      containers {
        image = "gcr.io/${var.project}/${local.service_name}"
        env {
          name  = "DEBUG"
          value = "True"
        }
        env {
          name  = "LOG_LEVEL"
          value = "DEBUG"
        }
        env {
          name  = "DJANGO_DB"
          value = "default"
        }
        env {
          name  = "POSTGRE_USER"
          value = "postgres"
        }
        env {
          name  = "POSTGRE_PASSWORD"
          value = google_sql_user.users.password
        }
        env {
          name  = "POSTGRE_NAME"
          value = "postgres"
        }
        env {
          name  = "POSTGRE_HOST"
          value = "/cloudsql/${google_sql_database_instance.label-studio-postgres.connection_name}"
        }
        env {
          name  = "POSTGRE_PORT"
          value = "5432"
        }
        env {
          name  = "GOOGLE_LOGGING_ENABLED"
          value = "True"
        }
      }
    }

    metadata {
      annotations = {
        "autoscaling.knative.dev/maxScale"      = "1000"
        "run.googleapis.com/cloudsql-instances" = google_sql_database_instance.label-studio-postgres.connection_name
        "run.googleapis.com/client-name"        = "terraform"
      }
    }
  }

  traffic {
    percent         = 100
    latest_revision = true
  }

  depends_on = [google_project_service.run, google_sql_database_instance.label-studio-postgres]
}

# Set service public
data "google_iam_policy" "noauth" {
  binding {
    role = "roles/run.invoker"
    members = [
      "allUsers",
    ]
  }
}

resource "google_cloud_run_service_iam_policy" "noauth" {
  location = google_cloud_run_service.label-studio.location
  project  = google_cloud_run_service.label-studio.project
  service  = google_cloud_run_service.label-studio.name

  policy_data = data.google_iam_policy.noauth.policy_data
  depends_on  = [google_cloud_run_service.label-studio]
}

resource "google_cloud_run_domain_mapping" "default" {
  location = var.region
  name     = var.domain_name

  metadata {
    namespace = var.project
  }

  spec {
    route_name = google_cloud_run_service.label-studio.name
  }
}
Enter fullscreen mode Exit fullscreen mode

variables.tf

variable "project" {
  type        = string
  description = "Google Cloud Platform Project ID"
}

variable "region" {
  default = "us-central1"
  type    = string
}

variable "database_user" {
  default     = "postgres"
  type        = string
  description = "PostgresSQL user."
}

variable "database_password" {
  type        = string
  description = "PostgresSQL database user password."
}

variable "database_tier" {
  default     = "db-f1-micro"
  type        = string
  description = "PostgresSQL database tier."
}

variable "domain_name" {
  type        = string
  description = "Domain name where service will be served from."
}
Enter fullscreen mode Exit fullscreen mode

Discussion (0)