Say you need to quickly label tons of data for an ML learning pipeline. You have people on your team or university lab ready to start labeling, but you are struggling to quickly put a tool up and running for them to label concurrently.
Label Studio is a great tool for this. In this post, my goal is to show you how to deploy label-studio as well as show how to create terraform files to quickly bootstrap a Django application running on Google Run connected to a Postgres instance on Google SQL.
The code can be found in this pull request.
- First login to your google cloud account.
gcloud auth application-default login
- Build the docker image and push it to Google Build with this cloudbuild.yaml script:
steps:
- id: "build image"
name: "gcr.io/cloud-builders/docker"
args:
[
"build",
"-t",
"gcr.io/${PROJECT_ID}/${_SERVICE_NAME}",
"${_SERVICE_FOLDER}",
]
- id: "push image"
name: "gcr.io/cloud-builders/docker"
args: ["push", "gcr.io/${PROJECT_ID}/${_SERVICE_NAME}"]
substitutions:
_SERVICE_FOLDER: .
_SERVICE_NAME: label-studio
images:
- "gcr.io/${PROJECT_ID}/${_SERVICE_NAME}"
I used direnv
to setup the PROJECT_ID env variable to be the Google Project created.
- Set up the terraform files to bootstrap the project. Below are the terraform I created to get this all up, but you will be needing to run the following commands to let the resources be created:
terraform init
terraform apply -var="database_password=<DB_PASSWORD>" \
-var="project=<PROJECT_ID>" \
-var="database_user=<DB_USER>" \
-var="domain_name=<DOMAIN_NAME>"
After about 10-15 min, all the resources will be created, and you will have Label Studio up and running to create users and start labeling!
main.tf
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 3.80"
}
}
}
provider "google" {
project = var.project
}
locals {
service_folder = "service"
service_name = "label-studio"
deployment_name = "label-studio"
label_studio_worker_sa = "serviceAccount:${google_service_account.label_studio_worker.email}"
}
project.tf
Creating the service account that will be used to run the application as well as enable the APIs and services that will be used.
# Enable services
resource "google_project_service" "run" {
service = "run.googleapis.com"
disable_on_destroy = false
}
resource "google_project_service" "iam" {
service = "iam.googleapis.com"
disable_on_destroy = false
}
resource "google_project_service" "cloudbuild" {
service = "cloudbuild.googleapis.com"
disable_on_destroy = false
}
# Create a service account
resource "google_service_account" "label_studio_worker" {
account_id = "label-studio-worker"
display_name = "Label Studio SA"
}
# Set permissions
resource "google_project_iam_binding" "service_permissions" {
for_each = toset([
"logging.logWriter", "cloudsql.client"
])
role = "roles/${each.key}"
members = [local.label_studio_worker_sa]
depends_on = [google_service_account.label_studio_worker]
}
database.tf
# The Cloud SQL postgres
resource "google_sql_database_instance" "label-studio-postgres" {
name = "label-studio-sql"
database_version = "POSTGRES_13"
region = var.region
settings {
tier = var.database_tier
}
}
resource "google_sql_user" "users" {
name = var.database_user
instance = google_sql_database_instance.label-studio-postgres.name
password = var.database_password
}
service.tf
Run the service, setup IAM policy and create the domain mapping.
# The Cloud Run service
resource "google_cloud_run_service" "label-studio" {
name = local.service_name
location = var.region
autogenerate_revision_name = true
template {
spec {
service_account_name = google_service_account.label_studio_worker.email
containers {
image = "gcr.io/${var.project}/${local.service_name}"
env {
name = "DEBUG"
value = "True"
}
env {
name = "LOG_LEVEL"
value = "DEBUG"
}
env {
name = "DJANGO_DB"
value = "default"
}
env {
name = "POSTGRE_USER"
value = "postgres"
}
env {
name = "POSTGRE_PASSWORD"
value = google_sql_user.users.password
}
env {
name = "POSTGRE_NAME"
value = "postgres"
}
env {
name = "POSTGRE_HOST"
value = "/cloudsql/${google_sql_database_instance.label-studio-postgres.connection_name}"
}
env {
name = "POSTGRE_PORT"
value = "5432"
}
env {
name = "GOOGLE_LOGGING_ENABLED"
value = "True"
}
}
}
metadata {
annotations = {
"autoscaling.knative.dev/maxScale" = "1000"
"run.googleapis.com/cloudsql-instances" = google_sql_database_instance.label-studio-postgres.connection_name
"run.googleapis.com/client-name" = "terraform"
}
}
}
traffic {
percent = 100
latest_revision = true
}
depends_on = [google_project_service.run, google_sql_database_instance.label-studio-postgres]
}
# Set service public
data "google_iam_policy" "noauth" {
binding {
role = "roles/run.invoker"
members = [
"allUsers",
]
}
}
resource "google_cloud_run_service_iam_policy" "noauth" {
location = google_cloud_run_service.label-studio.location
project = google_cloud_run_service.label-studio.project
service = google_cloud_run_service.label-studio.name
policy_data = data.google_iam_policy.noauth.policy_data
depends_on = [google_cloud_run_service.label-studio]
}
resource "google_cloud_run_domain_mapping" "default" {
location = var.region
name = var.domain_name
metadata {
namespace = var.project
}
spec {
route_name = google_cloud_run_service.label-studio.name
}
}
variables.tf
variable "project" {
type = string
description = "Google Cloud Platform Project ID"
}
variable "region" {
default = "us-central1"
type = string
}
variable "database_user" {
default = "postgres"
type = string
description = "PostgresSQL user."
}
variable "database_password" {
type = string
description = "PostgresSQL database user password."
}
variable "database_tier" {
default = "db-f1-micro"
type = string
description = "PostgresSQL database tier."
}
variable "domain_name" {
type = string
description = "Domain name where service will be served from."
}
Top comments (0)