Introduction
This challenge lab tests your ability to build an end-to-end pipeline that extracts text from images using the Cloud Vision API, translates it with the Cloud Translation API, and loads the results into BigQuery. Unlike guided labs, you're expected to fill in the blanks of a partially written Python script and configure IAM permissions yourself.
Let's walk through every task with clear explanations of why each step matters.
The Architecture
The pipeline works like this:
- A Python script reads image files from a Cloud Storage bucket
- Each image is sent to the Cloud Vision API for text detection
- The extracted text is saved back to Cloud Storage as a
.txtfile - If the text is not in Japanese (
locale != 'ja'), it's sent to the Translation API to get a Japanese translation - All results (original text, locale, translation) are uploaded to a BigQuery table.
Task 1: Configure a Service Account
Why a Service Account?
The Python script needs programmatic access to Vision API, Translation API, Cloud Storage, and BigQuery. A service account acts as the script's identity, and IAM roles define what it can do.
Commands
# Set your project ID
export PROJECT_ID=$(gcloud config get-value project)
# Create the service account
gcloud iam service-accounts create my-ml-sa \
--display-name="ML API Service Account"
# Grant BigQuery Data Editor role (to insert rows)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/bigquery.dataEditor"
# Grant Cloud Storage Object Admin role (to read images and write text files)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
# Grant Service Usage Consumer role (required to make API calls within the project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/serviceusage.serviceUsageConsumer"
Important: Without
roles/serviceusage.serviceUsageConsumer, the service account cannot consume any enabled APIs in the project (BigQuery, Vision, Translation, etc.), even if it has data-level roles likedataEditororstorage.objectAdmin. This results in a403 USER_PROJECT_DENIEDerror.
Verification
gcloud projects get-iam-policy $PROJECT_ID \
--flatten="bindings[].members" \
--filter="bindings.members:my-ml-sa@"
You should see roles/bigquery.dataEditor, roles/storage.objectAdmin, and roles/serviceusage.serviceUsageConsumer listed.
Task 2: Create and Download Credentials
Why Download a Key?
While Cloud Shell has default credentials for the logged-in user, the challenge explicitly requires you to create a JSON key file and point the GOOGLE_APPLICATION_CREDENTIALS environment variable to it. This simulates how credentials work in production environments outside GCP.
Commands
# Generate the JSON key file
gcloud iam service-accounts keys create ml-sa-key.json \
--iam-account=my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com
# Set the environment variable so Google Cloud client libraries find the key
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/ml-sa-key.json
Task 3: Modify the Script — Vision API Text Detection
Get the Script
gsutil cp gs://$PROJECT_ID/analyze-images-v2.py .
What to Modify
The script has four sections that need your attention: three # TBD: comments and one commented-out BigQuery upload line. Open the script with:
nano analyze-images-v2.py
TBD #1 — Create a Vision API image object:
Find the comment:
# TBD: Create a Vision API image object called image_object
Add below it:
image_object = vision.Image(content=file_content)
This creates an Image object from the raw bytes downloaded from Cloud Storage (file_content). The Vision API requires this object format to process images.
TBD #2 — Call the Vision API to detect text:
Find the comment:
# TBD: Detect text in the image and save the response data into an object called response
Add below it:
response = vision_client.document_text_detection(image=image_object)
This sends the image to the Vision API's document_text_detection method, which is optimized for dense text like signs. Note that the client variable is called vision_client (as defined earlier in the script), and the image parameter uses the image_object we just created.
Test It
Run the script after completing TBDs #1 and #2 to verify text extraction works before moving on:
python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID
You should see extracted text appearing in the console output.
Task 4: Modify the Script — Translation API
What to Modify
TBD #3 — Translate non-Japanese text to Japanese:
Find the comment:
# TBD: According to the target language pass the description data to the translation API
Add below it:
translation = translate_client.translate(desc, target_language='ja')
Key details:
- We use
desc(not a generic variable liketext) because that's the variable name the script assigns to the extracted description earlier:desc = response.text_annotations[0].description - The target language is
'ja'(Japanese) as specified in the lab instructions - The result is stored in
translation, and the script already accessestranslation['translatedText']on the next line
Enable the BigQuery Upload
At the very end of the script, find the commented-out line:
# errors = bq_client.insert_rows(table, rows_for_bq)
Remove the # to enable it:
errors = bq_client.insert_rows(table, rows_for_bq)
The line immediately after (assert errors == []) will verify the upload succeeded.
Complete Modified Script Reference
Here's a summary of all four changes in the script:
| Location in Script | What to Add / Change |
|---|---|
After # TBD: Create a Vision API image object
|
image_object = vision.Image(content=file_content) |
After # TBD: Detect text in the image
|
response = vision_client.document_text_detection(image=image_object) |
After # TBD: According to the target language
|
translation = translate_client.translate(desc, target_language='ja') |
| Last commented line | Remove # from errors = bq_client.insert_rows(table, rows_for_bq)
|
Run the Complete Script
python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID
Watch the output — you should see text being extracted from each image, locale detection, and Japanese translations for non-Japanese text, followed by "Writing Vision API image data to BigQuery..."
Understanding the Python Script (analyze-images-v2.py)
Before modifying the script, it's important to understand what it does. Here's a general overview followed by a line-by-line breakdown.
General Overview
The script is an automated image-processing pipeline. It connects to four Google Cloud services simultaneously: Cloud Storage (to read images and write text files), Vision API (to extract text from images via OCR), Translation API (to translate non-Japanese text into Japanese), and BigQuery (to store the final results in a queryable table).
The workflow for each image is: download the image bytes from the bucket → send them to the Vision API → save the detected text back to Cloud Storage as a .txt file → check the language locale → if not Japanese, translate to Japanese → collect all results → batch-upload everything to BigQuery at the end.
Line-by-Line Breakdown
# Dataset: image_classification_dataset
# Table name: image_text_detail
import os
import sys
Lines 1-4: Comments documenting the target BigQuery dataset/table. Imports os (to read environment variables) and sys (to read command-line arguments).
from google.cloud import storage, bigquery, language, vision, translate_v2
Line 7: Imports the five Google Cloud client libraries. storage for Cloud Storage, bigquery for BigQuery, language for Natural Language API (not used in this script but imported from the original template), vision for Vision API, and translate_v2 for the Translation API.
if ('GOOGLE_APPLICATION_CREDENTIALS' in os.environ):
if (not os.path.exists(os.environ['GOOGLE_APPLICATION_CREDENTIALS'])):
print ("The GOOGLE_APPLICATION_CREDENTIALS file does not exist.\n")
exit()
else:
print ("The GOOGLE_APPLICATION_CREDENTIALS environment variable is not defined.\n")
exit()
Lines 9-15: Credentials check. Verifies two things: (1) the GOOGLE_APPLICATION_CREDENTIALS environment variable is set, and (2) the file it points to actually exists on disk. If either check fails, the script exits immediately with an error message. This is a safety gate — without valid credentials, no API call will work.
if len(sys.argv)<3:
print('You must provide parameters for the Google Cloud project ID and Storage bucket')
print ('python3 '+sys.argv[0]+ '[PROJECT_NAME] [BUCKET_NAME]')
exit()
project_name = sys.argv[1]
bucket_name = sys.argv[2]
Lines 17-23: Argument parsing. The script requires two command-line arguments: the GCP project ID and the Cloud Storage bucket name. In this lab, both are the same value (your project ID). If you forget to pass them, the script prints usage instructions and exits.
storage_client = storage.Client()
bq_client = bigquery.Client(project=project_name)
nl_client = language.LanguageServiceClient()
Lines 26-28: Client initialization (part 1). Creates client objects for Cloud Storage, BigQuery (bound to your project), and the Natural Language API. The nl_client is inherited from the original template but not used in this challenge.
vision_client = vision.ImageAnnotatorClient()
translate_client = translate_v2.Client()
Lines 31-32: Client initialization (part 2). Creates the Vision API client (for text detection) and the Translation API client (for translating text). These are the two ML API clients you'll use in the TBD sections.
dataset_ref = bq_client.dataset('image_classification_dataset')
dataset = bigquery.Dataset(dataset_ref)
table_ref = dataset.table('image_text_detail')
table = bq_client.get_table(table_ref)
Lines 35-38: BigQuery table setup. Creates a reference chain: dataset name → dataset object → table name → table object. The get_table() call actually contacts BigQuery to verify the table exists and retrieves its schema. This is where the 403 USER_PROJECT_DENIED error occurs if the service account lacks the serviceUsageConsumer role.
rows_for_bq = []
Line 41: Results buffer. Initializes an empty list that will accumulate tuples of (description, locale, translated_text, filename) for each processed image. These get batch-uploaded to BigQuery at the end.
files = storage_client.bucket(bucket_name).list_blobs()
bucket = storage_client.bucket(bucket_name)
Lines 44-45: Bucket access. list_blobs() returns an iterator over every file (blob) in the bucket. The bucket object is saved separately because we'll need it later to upload text files.
print('Processing image files from GCS. This will take a few minutes..')
Line 47: Status message so you know the script is working.
for file in files:
if file.name.endswith('jpg') or file.name.endswith('png'):
file_content = file.download_as_string()
Lines 50-52: Main loop start. Iterates over every blob in the bucket, filters for image files (.jpg or .png), and downloads the image as raw bytes into file_content.
# TBD: Create a Vision API image object called image_object
image_object = vision.Image(content=file_content) # ← YOU ADD THIS
Line 55 (TBD #1): Wraps the raw image bytes into a vision.Image object. The Vision API cannot accept raw bytes directly — it needs this structured object that can hold either image bytes (content) or a GCS URI (source).
# TBD: Detect text in the image and save the response data into an object called response
response = vision_client.document_text_detection(image=image_object) # ← YOU ADD THIS
Line 59 (TBD #2): Sends the image to the Vision API's document_text_detection method. This performs OCR (Optical Character Recognition) optimized for dense text. The response contains a list of text_annotations — the first element holds the full concatenated text and the detected language.
text_data = response.text_annotations[0].description
Line 62: Extracts the full detected text from the first annotation. The text_annotations array always puts the complete text in index [0], with individual word-level detections in subsequent indices.
file_name = file.name.split('.')[0] + '.txt'
blob = bucket.blob(file_name)
blob.upload_from_string(text_data, content_type='text/plain')
Lines 65-67: Save text to Cloud Storage. Converts the image filename (e.g., sign1.jpg) to a text filename (sign1.txt), creates a blob reference, and uploads the extracted text. This creates a text file in the same bucket for each processed image.
desc = response.text_annotations[0].description
locale = response.text_annotations[0].locale
Lines 72-73: Extracts the description (full text) and locale (language code like 'en', 'ja', 'fr') from the response. Note that desc is the same value as text_data — the script extracts it again for clarity of variable naming.
if locale == '':
translated_text = desc
else:
# TBD: According to the target language pass the description data to the translation API
translation = translate_client.translate(desc, target_language='ja') # ← YOU ADD THIS
translated_text = translation['translatedText']
Lines 77-83 (TBD #3): Translation logic. If the locale is empty (no language detected), the original text is used as-is. Otherwise, the text is sent to the Translation API with target_language='ja' (Japanese). The API returns a dictionary; the translated text is in the 'translatedText' key.
print(translated_text)
Line 84: Prints the translated (or original) text to the console so you can monitor progress.
if len(response.text_annotations) > 0:
rows_for_bq.append((desc, locale, translated_text, file.name))
Lines 88-89: Collect results. If the Vision API found any text (safety check), appends a tuple with the original text, locale, translated text, and filename to the results buffer. This tuple matches the BigQuery table schema.
print('Writing Vision API image data to BigQuery...')
errors = bq_client.insert_rows(table, rows_for_bq) # ← YOU UNCOMMENT THIS
assert errors == []
Lines 91-93: BigQuery upload. After all images are processed, uses insert_rows() to perform a streaming insert of all collected rows into the BigQuery table. The assert verifies that no errors occurred — if any row failed to insert, the script crashes with an AssertionError.
Task 5: Validate with BigQuery
Run the Verification Query
Go to BigQuery in the Console or use the CLI:
bq query --use_legacy_sql=false \
'SELECT locale, COUNT(locale) as lcount FROM image_classification_dataset.image_text_detail GROUP BY locale ORDER BY lcount DESC'
You should see a breakdown of language codes (e.g., ja, en, fr, de) with their counts. This confirms the full pipeline worked end-to-end.
Quick Reference — All Commands in Order
# ============================================
# TASK 1: Create service account + bind roles
# ============================================
export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create my-ml-sa \
--display-name="ML API Service Account"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/bigquery.dataEditor"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/serviceusage.serviceUsageConsumer"
# ============================================
# TASK 2: Create credentials + set env var
# ============================================
gcloud iam service-accounts keys create ml-sa-key.json \
--iam-account=my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/ml-sa-key.json
# ============================================
# TASK 3 & 4: Copy and modify the script
# ============================================
gsutil cp gs://$PROJECT_ID/analyze-images-v2.py .
nano analyze-images-v2.py
# --- Inside nano, make these 4 edits: ---
# 1. After "TBD: Create a Vision API image object":
# image_object = vision.Image(content=file_content)
#
# 2. After "TBD: Detect text in the image":
# response = vision_client.document_text_detection(image=image_object)
#
# 3. After "TBD: According to the target language":
# translation = translate_client.translate(desc, target_language='ja')
#
# 4. Uncomment the last line:
# errors = bq_client.insert_rows(table, rows_for_bq)
# --- Save with Ctrl+O, Enter, Ctrl+X ---
# ============================================
# TASK 5: Run script and validate
# ============================================
python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID
bq query --use_legacy_sql=false \
'SELECT locale, COUNT(locale) as lcount FROM image_classification_dataset.image_text_detail GROUP BY locale ORDER BY lcount DESC'
Troubleshooting
| Problem | Solution |
|---|---|
403 USER_PROJECT_DENIED on BigQuery or API calls |
Add the missing role: gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" --role="roles/serviceusage.serviceUsageConsumer" — wait 1-2 min for propagation |
403 ACCESS_DENIED on Cloud Storage |
You may have used roles/storage.admin instead of roles/storage.objectAdmin. Fix: bind the correct role |
PERMISSION_DENIED on Vision/Translate API calls |
Enable the APIs: gcloud services enable vision.googleapis.com translate.googleapis.com
|
PERMISSION_DENIED on BigQuery |
Verify the dataEditor role was bound correctly; wait 1-2 minutes for IAM propagation |
ModuleNotFoundError |
Install packages: pip3 install google-cloud-vision google-cloud-translate google-cloud-bigquery google-cloud-storage google-cloud-language
|
| Credentials file error | Verify: echo $GOOGLE_APPLICATION_CREDENTIALS and ls -la ml-sa-key.json
|
NameError: name 'image_object' is not defined |
TBD #1 is missing — add image_object = vision.Image(content=file_content)
|
NameError: name 'response' is not defined |
TBD #2 is missing — add the vision_client.document_text_detection() call |
NameError: name 'translation' is not defined |
TBD #3 is missing — add the translate_client.translate() call |
| Empty BigQuery table | Confirm you uncommented errors = bq_client.insert_rows(table, rows_for_bq)
|
AssertionError on assert errors == []
|
Check that the BigQuery table image_text_detail exists in dataset image_classification_dataset
|
| Script argument error | Ensure you pass both arguments: python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID
|
Key Learnings
- Service accounts are the standard way to provide application-level credentials in GCP. Each service account can have granular IAM roles scoped to specific services.
-
GOOGLE_APPLICATION_CREDENTIALSis the universal environment variable that all Google Cloud client libraries check for authentication. - The Vision API requires an
Imageobject created from raw bytes — you can't pass the bytes directly to the detection method. - The Vision API's
document_text_detectionreturns a structured response where the first element intext_annotationscontains the full detected text and its locale. - The Translation API's
translate()method returns a dictionary withtranslatedText,detectedSourceLanguage, andinputkeys. -
BigQuery's
insert_rows()performs streaming inserts and returns an empty list on success. -
Always read the existing code before modifying — variable names like
vision_client,desc, andimage_objectare defined by the script and must be used exactly as expected. -
Use
roles/storage.objectAdmininstead ofroles/storage.admin— it grants object-level read/write/delete without unnecessary bucket-level management permissions.
Best Practices
-
Principle of least privilege: Only grant the roles your service account actually needs (
dataEditorfor BigQuery writes,storage.objectAdminfor GCS object access,serviceUsageConsumerfor API consumption). - Test incrementally: Run the script after each modification to catch errors early rather than debugging everything at once.
- Environment variables for credentials: Never hard-code paths to credential files in your scripts.
-
Read the existing code carefully: Variable names matter — using
vision_clientvsclientordescvstextcan causeNameErrorexceptions. -
Use
document_text_detectionovertext_detectionwhen dealing with dense text in images — it uses a more advanced OCR model.
Conclusion
This challenge lab walks you through a realistic ML pipeline pattern: ingest raw data (images), enrich it using ML APIs (Vision + Translation), and store structured results for analysis (BigQuery). These same building blocks — Cloud Storage for data lake, ML APIs for enrichment, BigQuery for analytics — appear in production architectures across industries. Mastering this flow gives you a solid foundation for building more complex ML data pipelines on Google Cloud.

Top comments (0)