DEV Community

Solving "Use Machine Learning APIs on Google Cloud: Challenge Lab" — A Complete Guide

Introduction

This challenge lab tests your ability to build an end-to-end pipeline that extracts text from images using the Cloud Vision API, translates it with the Cloud Translation API, and loads the results into BigQuery. Unlike guided labs, you're expected to fill in the blanks of a partially written Python script and configure IAM permissions yourself.

Let's walk through every task with clear explanations of why each step matters.


The Architecture

The pipeline works like this:

  1. A Python script reads image files from a Cloud Storage bucket
  2. Each image is sent to the Cloud Vision API for text detection
  3. The extracted text is saved back to Cloud Storage as a .txt file
  4. If the text is not in Japanese (locale != 'ja'), it's sent to the Translation API to get a Japanese translation
  5. All results (original text, locale, translation) are uploaded to a BigQuery table.

Graphic description of the challenge


Task 1: Configure a Service Account

Why a Service Account?

The Python script needs programmatic access to Vision API, Translation API, Cloud Storage, and BigQuery. A service account acts as the script's identity, and IAM roles define what it can do.

Commands

# Set your project ID
export PROJECT_ID=$(gcloud config get-value project)

# Create the service account
gcloud iam service-accounts create my-ml-sa \
  --display-name="ML API Service Account"

# Grant BigQuery Data Editor role (to insert rows)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

# Grant Cloud Storage Object Admin role (to read images and write text files)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

# Grant Service Usage Consumer role (required to make API calls within the project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"
Enter fullscreen mode Exit fullscreen mode

Important: Without roles/serviceusage.serviceUsageConsumer, the service account cannot consume any enabled APIs in the project (BigQuery, Vision, Translation, etc.), even if it has data-level roles like dataEditor or storage.objectAdmin. This results in a 403 USER_PROJECT_DENIED error.

Verification

gcloud projects get-iam-policy $PROJECT_ID \
  --flatten="bindings[].members" \
  --filter="bindings.members:my-ml-sa@"
Enter fullscreen mode Exit fullscreen mode

You should see roles/bigquery.dataEditor, roles/storage.objectAdmin, and roles/serviceusage.serviceUsageConsumer listed.


Task 2: Create and Download Credentials

Why Download a Key?

While Cloud Shell has default credentials for the logged-in user, the challenge explicitly requires you to create a JSON key file and point the GOOGLE_APPLICATION_CREDENTIALS environment variable to it. This simulates how credentials work in production environments outside GCP.

Commands

# Generate the JSON key file
gcloud iam service-accounts keys create ml-sa-key.json \
  --iam-account=my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com

# Set the environment variable so Google Cloud client libraries find the key
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/ml-sa-key.json
Enter fullscreen mode Exit fullscreen mode

Task 3: Modify the Script — Vision API Text Detection

Get the Script

gsutil cp gs://$PROJECT_ID/analyze-images-v2.py .
Enter fullscreen mode Exit fullscreen mode

What to Modify

The script has four sections that need your attention: three # TBD: comments and one commented-out BigQuery upload line. Open the script with:

nano analyze-images-v2.py
Enter fullscreen mode Exit fullscreen mode

TBD #1 — Create a Vision API image object:

Find the comment:

# TBD: Create a Vision API image object called image_object
Enter fullscreen mode Exit fullscreen mode

Add below it:

image_object = vision.Image(content=file_content)
Enter fullscreen mode Exit fullscreen mode

This creates an Image object from the raw bytes downloaded from Cloud Storage (file_content). The Vision API requires this object format to process images.

TBD #2 — Call the Vision API to detect text:

Find the comment:

# TBD: Detect text in the image and save the response data into an object called response
Enter fullscreen mode Exit fullscreen mode

Add below it:

response = vision_client.document_text_detection(image=image_object)
Enter fullscreen mode Exit fullscreen mode

This sends the image to the Vision API's document_text_detection method, which is optimized for dense text like signs. Note that the client variable is called vision_client (as defined earlier in the script), and the image parameter uses the image_object we just created.

Test It

Run the script after completing TBDs #1 and #2 to verify text extraction works before moving on:

python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID
Enter fullscreen mode Exit fullscreen mode

You should see extracted text appearing in the console output.


Task 4: Modify the Script — Translation API

What to Modify

TBD #3 — Translate non-Japanese text to Japanese:

Find the comment:

# TBD: According to the target language pass the description data to the translation API
Enter fullscreen mode Exit fullscreen mode

Add below it:

translation = translate_client.translate(desc, target_language='ja')
Enter fullscreen mode Exit fullscreen mode

Key details:

  • We use desc (not a generic variable like text) because that's the variable name the script assigns to the extracted description earlier: desc = response.text_annotations[0].description
  • The target language is 'ja' (Japanese) as specified in the lab instructions
  • The result is stored in translation, and the script already accesses translation['translatedText'] on the next line

Enable the BigQuery Upload

At the very end of the script, find the commented-out line:

# errors = bq_client.insert_rows(table, rows_for_bq)
Enter fullscreen mode Exit fullscreen mode

Remove the # to enable it:

errors = bq_client.insert_rows(table, rows_for_bq)
Enter fullscreen mode Exit fullscreen mode

The line immediately after (assert errors == []) will verify the upload succeeded.

Complete Modified Script Reference

Here's a summary of all four changes in the script:

Location in Script What to Add / Change
After # TBD: Create a Vision API image object image_object = vision.Image(content=file_content)
After # TBD: Detect text in the image response = vision_client.document_text_detection(image=image_object)
After # TBD: According to the target language translation = translate_client.translate(desc, target_language='ja')
Last commented line Remove # from errors = bq_client.insert_rows(table, rows_for_bq)

Run the Complete Script

python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID
Enter fullscreen mode Exit fullscreen mode

Watch the output — you should see text being extracted from each image, locale detection, and Japanese translations for non-Japanese text, followed by "Writing Vision API image data to BigQuery..."


Understanding the Python Script (analyze-images-v2.py)

Before modifying the script, it's important to understand what it does. Here's a general overview followed by a line-by-line breakdown.

General Overview

The script is an automated image-processing pipeline. It connects to four Google Cloud services simultaneously: Cloud Storage (to read images and write text files), Vision API (to extract text from images via OCR), Translation API (to translate non-Japanese text into Japanese), and BigQuery (to store the final results in a queryable table).

The workflow for each image is: download the image bytes from the bucket → send them to the Vision API → save the detected text back to Cloud Storage as a .txt file → check the language locale → if not Japanese, translate to Japanese → collect all results → batch-upload everything to BigQuery at the end.

Line-by-Line Breakdown

# Dataset: image_classification_dataset
# Table name: image_text_detail
import os
import sys
Enter fullscreen mode Exit fullscreen mode

Lines 1-4: Comments documenting the target BigQuery dataset/table. Imports os (to read environment variables) and sys (to read command-line arguments).

from google.cloud import storage, bigquery, language, vision, translate_v2
Enter fullscreen mode Exit fullscreen mode

Line 7: Imports the five Google Cloud client libraries. storage for Cloud Storage, bigquery for BigQuery, language for Natural Language API (not used in this script but imported from the original template), vision for Vision API, and translate_v2 for the Translation API.

if ('GOOGLE_APPLICATION_CREDENTIALS' in os.environ):
    if (not os.path.exists(os.environ['GOOGLE_APPLICATION_CREDENTIALS'])):
        print ("The GOOGLE_APPLICATION_CREDENTIALS file does not exist.\n")
        exit()
else:
    print ("The GOOGLE_APPLICATION_CREDENTIALS environment variable is not defined.\n")
    exit()
Enter fullscreen mode Exit fullscreen mode

Lines 9-15: Credentials check. Verifies two things: (1) the GOOGLE_APPLICATION_CREDENTIALS environment variable is set, and (2) the file it points to actually exists on disk. If either check fails, the script exits immediately with an error message. This is a safety gate — without valid credentials, no API call will work.

if len(sys.argv)<3:
    print('You must provide parameters for the Google Cloud project ID and Storage bucket')
    print ('python3 '+sys.argv[0]+ '[PROJECT_NAME] [BUCKET_NAME]')
    exit()

project_name = sys.argv[1]
bucket_name = sys.argv[2]
Enter fullscreen mode Exit fullscreen mode

Lines 17-23: Argument parsing. The script requires two command-line arguments: the GCP project ID and the Cloud Storage bucket name. In this lab, both are the same value (your project ID). If you forget to pass them, the script prints usage instructions and exits.

storage_client = storage.Client()
bq_client = bigquery.Client(project=project_name)
nl_client = language.LanguageServiceClient()
Enter fullscreen mode Exit fullscreen mode

Lines 26-28: Client initialization (part 1). Creates client objects for Cloud Storage, BigQuery (bound to your project), and the Natural Language API. The nl_client is inherited from the original template but not used in this challenge.

vision_client = vision.ImageAnnotatorClient()
translate_client = translate_v2.Client()
Enter fullscreen mode Exit fullscreen mode

Lines 31-32: Client initialization (part 2). Creates the Vision API client (for text detection) and the Translation API client (for translating text). These are the two ML API clients you'll use in the TBD sections.

dataset_ref = bq_client.dataset('image_classification_dataset')
dataset = bigquery.Dataset(dataset_ref)
table_ref = dataset.table('image_text_detail')
table = bq_client.get_table(table_ref)
Enter fullscreen mode Exit fullscreen mode

Lines 35-38: BigQuery table setup. Creates a reference chain: dataset name → dataset object → table name → table object. The get_table() call actually contacts BigQuery to verify the table exists and retrieves its schema. This is where the 403 USER_PROJECT_DENIED error occurs if the service account lacks the serviceUsageConsumer role.

rows_for_bq = []
Enter fullscreen mode Exit fullscreen mode

Line 41: Results buffer. Initializes an empty list that will accumulate tuples of (description, locale, translated_text, filename) for each processed image. These get batch-uploaded to BigQuery at the end.

files = storage_client.bucket(bucket_name).list_blobs()
bucket = storage_client.bucket(bucket_name)
Enter fullscreen mode Exit fullscreen mode

Lines 44-45: Bucket access. list_blobs() returns an iterator over every file (blob) in the bucket. The bucket object is saved separately because we'll need it later to upload text files.

print('Processing image files from GCS. This will take a few minutes..')
Enter fullscreen mode Exit fullscreen mode

Line 47: Status message so you know the script is working.

for file in files:
    if file.name.endswith('jpg') or  file.name.endswith('png'):
        file_content = file.download_as_string()
Enter fullscreen mode Exit fullscreen mode

Lines 50-52: Main loop start. Iterates over every blob in the bucket, filters for image files (.jpg or .png), and downloads the image as raw bytes into file_content.

        # TBD: Create a Vision API image object called image_object
        image_object = vision.Image(content=file_content)    # ← YOU ADD THIS
Enter fullscreen mode Exit fullscreen mode

Line 55 (TBD #1): Wraps the raw image bytes into a vision.Image object. The Vision API cannot accept raw bytes directly — it needs this structured object that can hold either image bytes (content) or a GCS URI (source).

        # TBD: Detect text in the image and save the response data into an object called response
        response = vision_client.document_text_detection(image=image_object)    # ← YOU ADD THIS
Enter fullscreen mode Exit fullscreen mode

Line 59 (TBD #2): Sends the image to the Vision API's document_text_detection method. This performs OCR (Optical Character Recognition) optimized for dense text. The response contains a list of text_annotations — the first element holds the full concatenated text and the detected language.

        text_data = response.text_annotations[0].description
Enter fullscreen mode Exit fullscreen mode

Line 62: Extracts the full detected text from the first annotation. The text_annotations array always puts the complete text in index [0], with individual word-level detections in subsequent indices.

        file_name = file.name.split('.')[0] + '.txt'
        blob = bucket.blob(file_name)
        blob.upload_from_string(text_data, content_type='text/plain')
Enter fullscreen mode Exit fullscreen mode

Lines 65-67: Save text to Cloud Storage. Converts the image filename (e.g., sign1.jpg) to a text filename (sign1.txt), creates a blob reference, and uploads the extracted text. This creates a text file in the same bucket for each processed image.

        desc = response.text_annotations[0].description
        locale = response.text_annotations[0].locale
Enter fullscreen mode Exit fullscreen mode

Lines 72-73: Extracts the description (full text) and locale (language code like 'en', 'ja', 'fr') from the response. Note that desc is the same value as text_data — the script extracts it again for clarity of variable naming.

        if locale == '':
            translated_text = desc
        else:
            # TBD: According to the target language pass the description data to the translation API
            translation = translate_client.translate(desc, target_language='ja')    # ← YOU ADD THIS

            translated_text = translation['translatedText']
Enter fullscreen mode Exit fullscreen mode

Lines 77-83 (TBD #3): Translation logic. If the locale is empty (no language detected), the original text is used as-is. Otherwise, the text is sent to the Translation API with target_language='ja' (Japanese). The API returns a dictionary; the translated text is in the 'translatedText' key.

        print(translated_text)
Enter fullscreen mode Exit fullscreen mode

Line 84: Prints the translated (or original) text to the console so you can monitor progress.

        if len(response.text_annotations) > 0:
            rows_for_bq.append((desc, locale, translated_text, file.name))
Enter fullscreen mode Exit fullscreen mode

Lines 88-89: Collect results. If the Vision API found any text (safety check), appends a tuple with the original text, locale, translated text, and filename to the results buffer. This tuple matches the BigQuery table schema.

print('Writing Vision API image data to BigQuery...')
errors = bq_client.insert_rows(table, rows_for_bq)    # ← YOU UNCOMMENT THIS
assert errors == []
Enter fullscreen mode Exit fullscreen mode

Lines 91-93: BigQuery upload. After all images are processed, uses insert_rows() to perform a streaming insert of all collected rows into the BigQuery table. The assert verifies that no errors occurred — if any row failed to insert, the script crashes with an AssertionError.


Task 5: Validate with BigQuery

Run the Verification Query

Go to BigQuery in the Console or use the CLI:

bq query --use_legacy_sql=false \
  'SELECT locale, COUNT(locale) as lcount FROM image_classification_dataset.image_text_detail GROUP BY locale ORDER BY lcount DESC'
Enter fullscreen mode Exit fullscreen mode

You should see a breakdown of language codes (e.g., ja, en, fr, de) with their counts. This confirms the full pipeline worked end-to-end.


Quick Reference — All Commands in Order

# ============================================
# TASK 1: Create service account + bind roles
# ============================================
export PROJECT_ID=$(gcloud config get-value project)

gcloud iam service-accounts create my-ml-sa \
  --display-name="ML API Service Account"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"

# ============================================
# TASK 2: Create credentials + set env var
# ============================================
gcloud iam service-accounts keys create ml-sa-key.json \
  --iam-account=my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com

export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/ml-sa-key.json

# ============================================
# TASK 3 & 4: Copy and modify the script
# ============================================
gsutil cp gs://$PROJECT_ID/analyze-images-v2.py .
nano analyze-images-v2.py

# --- Inside nano, make these 4 edits: ---
# 1. After "TBD: Create a Vision API image object":
#        image_object = vision.Image(content=file_content)
#
# 2. After "TBD: Detect text in the image":
#        response = vision_client.document_text_detection(image=image_object)
#
# 3. After "TBD: According to the target language":
#        translation = translate_client.translate(desc, target_language='ja')
#
# 4. Uncomment the last line:
#        errors = bq_client.insert_rows(table, rows_for_bq)
# --- Save with Ctrl+O, Enter, Ctrl+X ---

# ============================================
# TASK 5: Run script and validate
# ============================================
python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID

bq query --use_legacy_sql=false \
  'SELECT locale, COUNT(locale) as lcount FROM image_classification_dataset.image_text_detail GROUP BY locale ORDER BY lcount DESC'
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

Problem Solution
403 USER_PROJECT_DENIED on BigQuery or API calls Add the missing role: gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" --role="roles/serviceusage.serviceUsageConsumer" — wait 1-2 min for propagation
403 ACCESS_DENIED on Cloud Storage You may have used roles/storage.admin instead of roles/storage.objectAdmin. Fix: bind the correct role
PERMISSION_DENIED on Vision/Translate API calls Enable the APIs: gcloud services enable vision.googleapis.com translate.googleapis.com
PERMISSION_DENIED on BigQuery Verify the dataEditor role was bound correctly; wait 1-2 minutes for IAM propagation
ModuleNotFoundError Install packages: pip3 install google-cloud-vision google-cloud-translate google-cloud-bigquery google-cloud-storage google-cloud-language
Credentials file error Verify: echo $GOOGLE_APPLICATION_CREDENTIALS and ls -la ml-sa-key.json
NameError: name 'image_object' is not defined TBD #1 is missing — add image_object = vision.Image(content=file_content)
NameError: name 'response' is not defined TBD #2 is missing — add the vision_client.document_text_detection() call
NameError: name 'translation' is not defined TBD #3 is missing — add the translate_client.translate() call
Empty BigQuery table Confirm you uncommented errors = bq_client.insert_rows(table, rows_for_bq)
AssertionError on assert errors == [] Check that the BigQuery table image_text_detail exists in dataset image_classification_dataset
Script argument error Ensure you pass both arguments: python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID

Key Learnings

  • Service accounts are the standard way to provide application-level credentials in GCP. Each service account can have granular IAM roles scoped to specific services.
  • GOOGLE_APPLICATION_CREDENTIALS is the universal environment variable that all Google Cloud client libraries check for authentication.
  • The Vision API requires an Image object created from raw bytes — you can't pass the bytes directly to the detection method.
  • The Vision API's document_text_detection returns a structured response where the first element in text_annotations contains the full detected text and its locale.
  • The Translation API's translate() method returns a dictionary with translatedText, detectedSourceLanguage, and input keys.
  • BigQuery's insert_rows() performs streaming inserts and returns an empty list on success.
  • Always read the existing code before modifying — variable names like vision_client, desc, and image_object are defined by the script and must be used exactly as expected.
  • Use roles/storage.objectAdmin instead of roles/storage.admin — it grants object-level read/write/delete without unnecessary bucket-level management permissions.

Best Practices

  1. Principle of least privilege: Only grant the roles your service account actually needs (dataEditor for BigQuery writes, storage.objectAdmin for GCS object access, serviceUsageConsumer for API consumption).
  2. Test incrementally: Run the script after each modification to catch errors early rather than debugging everything at once.
  3. Environment variables for credentials: Never hard-code paths to credential files in your scripts.
  4. Read the existing code carefully: Variable names matter — using vision_client vs client or desc vs text can cause NameError exceptions.
  5. Use document_text_detection over text_detection when dealing with dense text in images — it uses a more advanced OCR model.

Conclusion

This challenge lab walks you through a realistic ML pipeline pattern: ingest raw data (images), enrich it using ML APIs (Vision + Translation), and store structured results for analysis (BigQuery). These same building blocks — Cloud Storage for data lake, ML APIs for enrichment, BigQuery for analytics — appear in production architectures across industries. Mastering this flow gives you a solid foundation for building more complex ML data pipelines on Google Cloud.

Top comments (0)