William Schnaider Torres Bermon

Posted on Mar 19 • Edited on Apr 8

Solving "Use Machine Learning APIs on Google Cloud: Challenge Lab" — A Complete Guide

#googlecloud #machinelearning #python #googleaichallenge

Introduction

This challenge lab tests your ability to build an end-to-end pipeline that extracts text from images using the Cloud Vision API, translates it with the Cloud Translation API, and loads the results into BigQuery. Unlike guided labs, you're expected to fill in the blanks of a partially written Python script and configure IAM permissions yourself.

Let's walk through every task with clear explanations of why each step matters.

The Architecture

The pipeline works like this:

A Python script reads image files from a Cloud Storage bucket
Each image is sent to the Cloud Vision API for text detection
The extracted text is saved back to Cloud Storage as a .txt file
If the text is not in Japanese (locale != 'ja'), it's sent to the Translation API to get a Japanese translation
All results (original text, locale, translation) are uploaded to a BigQuery table.

Task 1: Configure a Service Account

Why a Service Account?

The Python script needs programmatic access to Vision API, Translation API, Cloud Storage, and BigQuery. A service account acts as the script's identity, and IAM roles define what it can do.

Commands

# Set your project ID
export PROJECT_ID=$(gcloud config get-value project)

# Create the service account
gcloud iam service-accounts create my-ml-sa \
  --display-name="ML API Service Account"

# Grant BigQuery Data Editor role (to insert rows)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

# Grant Cloud Storage Object Admin role (to read images and write text files)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

# Grant Service Usage Consumer role (required to make API calls within the project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"

Important: Without roles/serviceusage.serviceUsageConsumer, the service account cannot consume any enabled APIs in the project (BigQuery, Vision, Translation, etc.), even if it has data-level roles like dataEditor or storage.objectAdmin. This results in a 403 USER_PROJECT_DENIED error.

Verification

gcloud projects get-iam-policy $PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:my-ml-sa@"

You should see roles/bigquery.dataEditor, roles/storage.objectAdmin, and roles/serviceusage.serviceUsageConsumer listed.

Task 2: Create and Download Credentials

Why Download a Key?

While Cloud Shell has default credentials for the logged-in user, the challenge explicitly requires you to create a JSON key file and point the GOOGLE_APPLICATION_CREDENTIALS environment variable to it. This simulates how credentials work in production environments outside GCP.

Commands

# Generate the JSON key file
gcloud iam service-accounts keys create ml-sa-key.json \
  --iam-account=my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com

# Set the environment variable so Google Cloud client libraries find the key
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/ml-sa-key.json

Task 3: Modify the Script — Vision API Text Detection

Get the Script

gsutil cp gs://$PROJECT_ID/analyze-images-v2.py .

What to Modify

The script has four sections that need your attention: three # TBD: comments and one commented-out BigQuery upload line. Open the script with:

nano analyze-images-v2.py

TBD #1 — Create a Vision API image object:

Find the comment:

# TBD: Create a Vision API image object called image_object

Add below it:

image_object = vision.Image(content=file_content)

This creates an Image object from the raw bytes downloaded from Cloud Storage (file_content). The Vision API requires this object format to process images.

TBD #2 — Call the Vision API to detect text:

Find the comment:

# TBD: Detect text in the image and save the response data into an object called response

Add below it:

response = vision_client.document_text_detection(image=image_object)

This sends the image to the Vision API's document_text_detection method, which is optimized for dense text like signs. Note that the client variable is called vision_client (as defined earlier in the script), and the image parameter uses the image_object we just created.

Test It

Run the script after completing TBDs #1 and #2 to verify text extraction works before moving on:

python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID

You should see extracted text appearing in the console output.

Task 4: Modify the Script — Translation API

What to Modify

TBD #3 — Translate non-Japanese text to Japanese:

Find the comment:

# TBD: According to the target language pass the description data to the translation API

Add below it:

translation = translate_client.translate(desc, target_language='ja')

Key details:

We use desc (not a generic variable like text) because that's the variable name the script assigns to the extracted description earlier: desc = response.text_annotations[0].description
The target language is 'ja' (Japanese) as specified in the lab instructions
The result is stored in translation, and the script already accesses translation['translatedText'] on the next line

Enable the BigQuery Upload

At the very end of the script, find the commented-out line:

# errors = bq_client.insert_rows(table, rows_for_bq)

Remove the # to enable it:

errors = bq_client.insert_rows(table, rows_for_bq)

The line immediately after (assert errors == []) will verify the upload succeeded.

Complete Modified Script Reference

Here's a summary of all four changes in the script:

Location in Script	What to Add / Change
After `# TBD: Create a Vision API image object`	`image_object = vision.Image(content=file_content)`
After `# TBD: Detect text in the image`	`response = vision_client.document_text_detection(image=image_object)`
After `# TBD: According to the target language`	`translation = translate_client.translate(desc, target_language='ja')`
Last commented line	Remove `#` from `errors = bq_client.insert_rows(table, rows_for_bq)`

Run the Complete Script

python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID

Watch the output — you should see text being extracted from each image, locale detection, and Japanese translations for non-Japanese text, followed by "Writing Vision API image data to BigQuery..."

Understanding the Python Script (`analyze-images-v2.py`)

Before modifying the script, it's important to understand what it does. Here's a general overview followed by a line-by-line breakdown.

General Overview

The script is an automated image-processing pipeline. It connects to four Google Cloud services simultaneously: Cloud Storage (to read images and write text files), Vision API (to extract text from images via OCR), Translation API (to translate non-Japanese text into Japanese), and BigQuery (to store the final results in a queryable table).

The workflow for each image is: download the image bytes from the bucket → send them to the Vision API → save the detected text back to Cloud Storage as a .txt file → check the language locale → if not Japanese, translate to Japanese → collect all results → batch-upload everything to BigQuery at the end.

Line-by-Line Breakdown

# Dataset: image_classification_dataset
# Table name: image_text_detail
import os
import sys

Lines 1-4: Comments documenting the target BigQuery dataset/table. Imports os (to read environment variables) and sys (to read command-line arguments).

from google.cloud import storage, bigquery, language, vision, translate_v2

Line 7: Imports the five Google Cloud client libraries. storage for Cloud Storage, bigquery for BigQuery, language for Natural Language API (not used in this script but imported from the original template), vision for Vision API, and translate_v2 for the Translation API.

if ('GOOGLE_APPLICATION_CREDENTIALS' in os.environ):
    if (not os.path.exists(os.environ['GOOGLE_APPLICATION_CREDENTIALS'])):
        print ("The GOOGLE_APPLICATION_CREDENTIALS file does not exist.\n")
        exit()
else:
    print ("The GOOGLE_APPLICATION_CREDENTIALS environment variable is not defined.\n")
    exit()

Lines 9-15: Credentials check. Verifies two things: (1) the GOOGLE_APPLICATION_CREDENTIALS environment variable is set, and (2) the file it points to actually exists on disk. If either check fails, the script exits immediately with an error message. This is a safety gate — without valid credentials, no API call will work.

if len(sys.argv)<3:
    print('You must provide parameters for the Google Cloud project ID and Storage bucket')
    print ('python3 '+sys.argv[0]+ '[PROJECT_NAME] [BUCKET_NAME]')
    exit()

project_name = sys.argv[1]
bucket_name = sys.argv[2]

Lines 17-23: Argument parsing. The script requires two command-line arguments: the GCP project ID and the Cloud Storage bucket name. In this lab, both are the same value (your project ID). If you forget to pass them, the script prints usage instructions and exits.

storage_client = storage.Client()
bq_client = bigquery.Client(project=project_name)
nl_client = language.LanguageServiceClient()

Lines 26-28: Client initialization (part 1). Creates client objects for Cloud Storage, BigQuery (bound to your project), and the Natural Language API. The nl_client is inherited from the original template but not used in this challenge.

vision_client = vision.ImageAnnotatorClient()
translate_client = translate_v2.Client()

Lines 31-32: Client initialization (part 2). Creates the Vision API client (for text detection) and the Translation API client (for translating text). These are the two ML API clients you'll use in the TBD sections.

dataset_ref = bq_client.dataset('image_classification_dataset')
dataset = bigquery.Dataset(dataset_ref)
table_ref = dataset.table('image_text_detail')
table = bq_client.get_table(table_ref)

Lines 35-38: BigQuery table setup. Creates a reference chain: dataset name → dataset object → table name → table object. The get_table() call actually contacts BigQuery to verify the table exists and retrieves its schema. This is where the 403 USER_PROJECT_DENIED error occurs if the service account lacks the serviceUsageConsumer role.

rows_for_bq = []

Line 41: Results buffer. Initializes an empty list that will accumulate tuples of (description, locale, translated_text, filename) for each processed image. These get batch-uploaded to BigQuery at the end.

files = storage_client.bucket(bucket_name).list_blobs()
bucket = storage_client.bucket(bucket_name)

Lines 44-45: Bucket access. list_blobs() returns an iterator over every file (blob) in the bucket. The bucket object is saved separately because we'll need it later to upload text files.

print('Processing image files from GCS. This will take a few minutes..')

Line 47: Status message so you know the script is working.

for file in files:
    if file.name.endswith('jpg') or  file.name.endswith('png'):
        file_content = file.download_as_string()

Lines 50-52: Main loop start. Iterates over every blob in the bucket, filters for image files (.jpg or .png), and downloads the image as raw bytes into file_content.

        # TBD: Create a Vision API image object called image_object
        image_object = vision.Image(content=file_content)    # ← YOU ADD THIS

Line 55 (TBD #1): Wraps the raw image bytes into a vision.Image object. The Vision API cannot accept raw bytes directly — it needs this structured object that can hold either image bytes (content) or a GCS URI (source).

        # TBD: Detect text in the image and save the response data into an object called response
        response = vision_client.document_text_detection(image=image_object)    # ← YOU ADD THIS

Line 59 (TBD #2): Sends the image to the Vision API's document_text_detection method. This performs OCR (Optical Character Recognition) optimized for dense text. The response contains a list of text_annotations — the first element holds the full concatenated text and the detected language.

        text_data = response.text_annotations[0].description

Line 62: Extracts the full detected text from the first annotation. The text_annotations array always puts the complete text in index [0], with individual word-level detections in subsequent indices.

        file_name = file.name.split('.')[0] + '.txt'
        blob = bucket.blob(file_name)
        blob.upload_from_string(text_data, content_type='text/plain')

Lines 65-67: Save text to Cloud Storage. Converts the image filename (e.g., sign1.jpg) to a text filename (sign1.txt), creates a blob reference, and uploads the extracted text. This creates a text file in the same bucket for each processed image.

        desc = response.text_annotations[0].description
        locale = response.text_annotations[0].locale

Lines 72-73: Extracts the description (full text) and locale (language code like 'en', 'ja', 'fr') from the response. Note that desc is the same value as text_data — the script extracts it again for clarity of variable naming.

        if locale == '':
            translated_text = desc
        else:
            # TBD: According to the target language pass the description data to the translation API
            translation = translate_client.translate(desc, target_language='ja')    # ← YOU ADD THIS

            translated_text = translation['translatedText']

Lines 77-83 (TBD #3): Translation logic. If the locale is empty (no language detected), the original text is used as-is. Otherwise, the text is sent to the Translation API with target_language='ja' (Japanese). The API returns a dictionary; the translated text is in the 'translatedText' key.

        print(translated_text)

Line 84: Prints the translated (or original) text to the console so you can monitor progress.

        if len(response.text_annotations) > 0:
            rows_for_bq.append((desc, locale, translated_text, file.name))

Lines 88-89: Collect results. If the Vision API found any text (safety check), appends a tuple with the original text, locale, translated text, and filename to the results buffer. This tuple matches the BigQuery table schema.

print('Writing Vision API image data to BigQuery...')
errors = bq_client.insert_rows(table, rows_for_bq)    # ← YOU UNCOMMENT THIS
assert errors == []

Lines 91-93: BigQuery upload. After all images are processed, uses insert_rows() to perform a streaming insert of all collected rows into the BigQuery table. The assert verifies that no errors occurred — if any row failed to insert, the script crashes with an AssertionError.

Task 5: Validate with BigQuery

Run the Verification Query

Go to BigQuery in the Console or use the CLI:

bq query --use_legacy_sql=false \
  'SELECT locale, COUNT(locale) as lcount FROM image_classification_dataset.image_text_detail GROUP BY locale ORDER BY lcount DESC'

You should see a breakdown of language codes (e.g., ja, en, fr, de) with their counts. This confirms the full pipeline worked end-to-end.

Quick Reference — All Commands in Order

# ============================================
# TASK 1: Create service account + bind roles
# ============================================
export PROJECT_ID=$(gcloud config get-value project)

gcloud iam service-accounts create my-ml-sa \
  --display-name="ML API Service Account"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"

# ============================================
# TASK 2: Create credentials + set env var
# ============================================
gcloud iam service-accounts keys create ml-sa-key.json \
  --iam-account=my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com

export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/ml-sa-key.json

# ============================================
# TASK 3 & 4: Copy and modify the script
# ============================================
gsutil cp gs://$PROJECT_ID/analyze-images-v2.py .
nano analyze-images-v2.py

# --- Inside nano, make these 4 edits: ---
# 1. After "TBD: Create a Vision API image object":
#        image_object = vision.Image(content=file_content)
#
# 2. After "TBD: Detect text in the image":
#        response = vision_client.document_text_detection(image=image_object)
#
# 3. After "TBD: According to the target language":
#        translation = translate_client.translate(desc, target_language='ja')
#
# 4. Uncomment the last line:
#        errors = bq_client.insert_rows(table, rows_for_bq)
# --- Save with Ctrl+O, Enter, Ctrl+X ---

# ============================================
# TASK 5: Run script and validate
# ============================================
python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID

bq query --use_legacy_sql=false \
  'SELECT locale, COUNT(locale) as lcount FROM image_classification_dataset.image_text_detail GROUP BY locale ORDER BY lcount DESC'

Troubleshooting

Problem	Solution
`403 USER_PROJECT_DENIED` on BigQuery or API calls	Add the missing role: `gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" --role="roles/serviceusage.serviceUsageConsumer"` — wait 1-2 min for propagation
`403 ACCESS_DENIED` on Cloud Storage	You may have used `roles/storage.admin` instead of `roles/storage.objectAdmin`. Fix: bind the correct role
`PERMISSION_DENIED` on Vision/Translate API calls	Enable the APIs: `gcloud services enable vision.googleapis.com translate.googleapis.com`
`PERMISSION_DENIED` on BigQuery	Verify the `dataEditor` role was bound correctly; wait 1-2 minutes for IAM propagation
`ModuleNotFoundError`	Install packages: `pip3 install google-cloud-vision google-cloud-translate google-cloud-bigquery google-cloud-storage google-cloud-language`
Credentials file error	Verify: `echo $GOOGLE_APPLICATION_CREDENTIALS` and `ls -la ml-sa-key.json`
`NameError: name 'image_object' is not defined`	TBD #1 is missing — add `image_object = vision.Image(content=file_content)`
`NameError: name 'response' is not defined`	TBD #2 is missing — add the `vision_client.document_text_detection()` call
`NameError: name 'translation' is not defined`	TBD #3 is missing — add the `translate_client.translate()` call
Empty BigQuery table	Confirm you uncommented `errors = bq_client.insert_rows(table, rows_for_bq)`
`AssertionError` on `assert errors == []`	Check that the BigQuery table `image_text_detail` exists in dataset `image_classification_dataset`
Script argument error	Ensure you pass both arguments: `python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID`

Key Learnings

Service accounts are the standard way to provide application-level credentials in GCP. Each service account can have granular IAM roles scoped to specific services.
GOOGLE_APPLICATION_CREDENTIALS is the universal environment variable that all Google Cloud client libraries check for authentication.
The Vision API requires an Image object created from raw bytes — you can't pass the bytes directly to the detection method.
The Vision API's document_text_detection returns a structured response where the first element in text_annotations contains the full detected text and its locale.
The Translation API's translate() method returns a dictionary with translatedText, detectedSourceLanguage, and input keys.
BigQuery's insert_rows() performs streaming inserts and returns an empty list on success.
Always read the existing code before modifying — variable names like vision_client, desc, and image_object are defined by the script and must be used exactly as expected.
Use roles/storage.objectAdmin instead of roles/storage.admin — it grants object-level read/write/delete without unnecessary bucket-level management permissions.

Best Practices

Principle of least privilege: Only grant the roles your service account actually needs (dataEditor for BigQuery writes, storage.objectAdmin for GCS object access, serviceUsageConsumer for API consumption).
Test incrementally: Run the script after each modification to catch errors early rather than debugging everything at once.
Environment variables for credentials: Never hard-code paths to credential files in your scripts.
Read the existing code carefully: Variable names matter — using vision_client vs client or desc vs text can cause NameError exceptions.
Use document_text_detection over text_detection when dealing with dense text in images — it uses a more advanced OCR model.

Conclusion

This challenge lab walks you through a realistic ML pipeline pattern: ingest raw data (images), enrich it using ML APIs (Vision + Translation), and store structured results for analysis (BigQuery). These same building blocks — Cloud Storage for data lake, ML APIs for enrichment, BigQuery for analytics — appear in production architectures across industries. Mastering this flow gives you a solid foundation for building more complex ML data pipelines on Google Cloud.

DEV Community

Solving "Use Machine Learning APIs on Google Cloud: Challenge Lab" — A Complete Guide

Introduction

The Architecture

Task 1: Configure a Service Account

Why a Service Account?

Commands

Verification

Task 2: Create and Download Credentials

Why Download a Key?

Commands

Task 3: Modify the Script — Vision API Text Detection

Get the Script

What to Modify

Test It

Task 4: Modify the Script — Translation API

What to Modify

Enable the BigQuery Upload

Complete Modified Script Reference

Run the Complete Script

Understanding the Python Script (`analyze-images-v2.py`)

General Overview

Line-by-Line Breakdown

Task 5: Validate with BigQuery

Run the Verification Query

Quick Reference — All Commands in Order

Troubleshooting

Key Learnings

Best Practices

Conclusion

Top comments (0)

Introduction

The Architecture

Task 1: Configure a Service Account

Why a Service Account?

Commands

Verification

Task 2: Create and Download Credentials

Why Download a Key?

Commands

Task 3: Modify the Script — Vision API Text Detection

Get the Script

What to Modify

Test It

Task 4: Modify the Script — Translation API

What to Modify

Enable the BigQuery Upload

Complete Modified Script Reference

Run the Complete Script

Understanding the Python Script (analyze-images-v2.py)

General Overview

Line-by-Line Breakdown

Task 5: Validate with BigQuery

Run the Verification Query

Quick Reference — All Commands in Order

Troubleshooting

Key Learnings

Best Practices

Conclusion

Understanding the Python Script (`analyze-images-v2.py`)