<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: William Schnaider Torres Bermon</title>
    <description>The latest articles on DEV Community by William Schnaider Torres Bermon (@willtorber).</description>
    <link>https://dev.to/willtorber</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3728365%2F69cdea6c-28ad-4266-9b7d-0e3dc79a8910.jpg</url>
      <title>DEV Community: William Schnaider Torres Bermon</title>
      <link>https://dev.to/willtorber</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/willtorber"/>
    <language>en</language>
    <item>
      <title>Solving "Analyze and Reason on Multimodal Data with Gemini: Challenge Lab" — A Complete Guide</title>
      <dc:creator>William Schnaider Torres Bermon</dc:creator>
      <pubDate>Wed, 08 Apr 2026 04:45:09 +0000</pubDate>
      <link>https://dev.to/willtorber/solving-analyze-and-reason-on-multimodal-data-with-gemini-challenge-lab-a-complete-guide-4che</link>
      <guid>https://dev.to/willtorber/solving-analyze-and-reason-on-multimodal-data-with-gemini-challenge-lab-a-complete-guide-4che</guid>
      <description>&lt;p&gt;Multimodal AI is no longer a futuristic concept — it's a practical tool that can analyze text reviews, product images, and podcast audio in a single workflow. In this post, I walk through the &lt;strong&gt;&lt;a href="https://www.skills.google/course_templates/1240/labs/618945?locale=en" rel="noopener noreferrer"&gt;GSP524 Challenge Lab&lt;/a&gt;&lt;/strong&gt; from Google Cloud Skills Boost, where we use the &lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; model on Vertex AI to extract actionable marketing insights from three different data modalities for a fictional brand called &lt;strong&gt;Cymbal Direct&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you're preparing for this lab or want to understand how multimodal prompting with Gemini actually works in practice, this guide covers every task with the reasoning behind each solution.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Scenario
&lt;/h2&gt;

&lt;p&gt;Cymbal Direct has just launched a new line of athletic apparel. Our job is to analyze social media engagement across three channels:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text&lt;/strong&gt; — Customer reviews and social media posts (sentiment, themes, product mentions).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Images&lt;/strong&gt; — Influencer and customer photos (style trends, visual messaging, target audience).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio&lt;/strong&gt; — A podcast interview with a Cymbal Direct representative (satisfaction drivers, biases, recommendations).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Finally, we synthesize everything into a comprehensive Markdown report and upload it to Cloud Storage.&lt;/p&gt;




&lt;h2&gt;
  
  
  Environment Setup (Task 1)
&lt;/h2&gt;

&lt;p&gt;The lab provides a pre-configured &lt;strong&gt;Vertex AI Workbench&lt;/strong&gt; instance with a Jupyter notebook (&lt;code&gt;gsp524-challenge.ipynb&lt;/code&gt;). Task 1 has no TODOs — you just run the provided cells to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the Google Gen AI SDK (&lt;code&gt;google-genai&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Restart the kernel (important — the new package won't load without this).&lt;/li&gt;
&lt;li&gt;Import all required libraries, including &lt;code&gt;Part&lt;/code&gt;, &lt;code&gt;ThinkingConfig&lt;/code&gt;, and &lt;code&gt;GenerateContentConfig&lt;/code&gt; from &lt;code&gt;google.genai.types&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Initialize the Gen AI client pointing to your lab project.&lt;/li&gt;
&lt;li&gt;Set the model ID to &lt;code&gt;gemini-2.5-flash&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Two critical objects are set up here that you'll reuse throughout the lab:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The client — your gateway to Gemini
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LOCATION&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# The model
&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Later, a &lt;code&gt;config&lt;/code&gt; object enables &lt;strong&gt;Gemini thinking&lt;/strong&gt; (extended reasoning) with dynamic budget:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;thinking_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ThinkingConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;include_thoughts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thinking_budget&lt;/span&gt;&lt;span class="o"&gt;=-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="c1"&gt;# Dynamic: model decides how much to reason
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This &lt;code&gt;config&lt;/code&gt; is the key difference between a basic call and a deep-reasoning call. You'll use it in every "Deep Dive" section.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 2: Analyzing Customer Reviews (Text)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Initial Analysis
&lt;/h3&gt;

&lt;p&gt;The first real challenge is constructing a prompt that tells Gemini exactly what to extract from the raw text data. The reviews are loaded from a file, and we embed them directly into the prompt using an f-string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Analyze the following customer reviews and social media posts about
Cymbal Direct&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s new athletic apparel line. For each review or post:
- Identify the overall sentiment (positive, negative, or neutral).
- Extract key themes and topics discussed, such as product quality,
  fit, style, customer service, and pricing.
- Identify any frequently mentioned product names or specific features.

Provide a structured summary of your findings in Markdown format.

Customer Reviews and Social Media Posts:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this works:&lt;/strong&gt; The prompt is explicit about the three dimensions we care about (sentiment, themes, product names) and asks for structured Markdown output. Gemini handles the rest — it categorizes each review and surfaces patterns across the dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deep Dive with Thinking
&lt;/h3&gt;

&lt;p&gt;Now we go deeper. The second prompt asks Gemini to &lt;em&gt;reason&lt;/em&gt; about what's driving sentiment and to role-play as a marketing consultant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;thinking_mode_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Analyze the following customer reviews and social media posts in detail.
Specifically:
- Identify the main factors driving positive and negative sentiment.
- Assess the overall impact on brand perception.
- Identify three key areas where Cymbal Direct can improve.
- Highlight the three most important takeaways as if presenting to
  the Cymbal Direct marketing team.

Customer Reviews and Social Media Posts:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;thinking_model_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thinking_mode_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# &amp;lt;-- This enables thinking mode
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only API-level difference is passing &lt;code&gt;config=config&lt;/code&gt;. But the output is dramatically richer — Gemini shows its chain of thought before delivering the final answer, and the &lt;code&gt;print_thoughts()&lt;/code&gt; helper function separates these for display.&lt;/p&gt;

&lt;p&gt;The analysis is saved to &lt;code&gt;analysis/text_analysis.md&lt;/code&gt; for use in the final synthesis.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 3: Analyzing Images (Visual Content)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Initial Analysis
&lt;/h3&gt;

&lt;p&gt;Images require a different content structure. Instead of embedding data in the prompt string, we pass a list of &lt;code&gt;Part&lt;/code&gt; objects alongside the prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Analyze the following images of Cymbal Direct&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s new athletic apparel line.
For each image:
- Identify the apparel items shown.
- Describe the attributes of each item (color, style, material, branding).
- Identify any prominent style trends or preferences across the images.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;image_parts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Prompt + list of image Part objects
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key pattern:&lt;/strong&gt; For multimodal content, &lt;code&gt;contents&lt;/code&gt; accepts a list where the first element is the text prompt and subsequent elements are &lt;code&gt;Part&lt;/code&gt; objects (images, audio, video). The images are loaded as bytes and wrapped with &lt;code&gt;Part.from_bytes()&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reasoning on Image Trends
&lt;/h3&gt;

&lt;p&gt;The deep dive asks Gemini to go beyond description into &lt;em&gt;inference&lt;/em&gt; — hypothesizing about target audience, analyzing visual composition, and comparing to broader fashion trends:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;thinking_mode_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Analyze the images in greater detail:
- Hypothesize about the target audience for each image.
- Analyze how visual elements contribute to the overall message and appeal.
- Compare observed trends with broader athletic wear fashion trends.
- Provide recommendations for future marketing campaigns.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;thinking_model_response_image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;thinking_mode_prompt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;image_parts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same pattern: prompt + image parts + thinking config. Results are saved to &lt;code&gt;analysis/image_analysis.md&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 4: Analyzing Audio (Podcast)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Initial Analysis
&lt;/h3&gt;

&lt;p&gt;Audio follows the same multimodal pattern, but uses &lt;code&gt;Part.from_uri()&lt;/code&gt; instead of &lt;code&gt;Part.from_bytes()&lt;/code&gt; since the audio file lives in Cloud Storage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Audio part (created in a setup cell)
&lt;/span&gt;&lt;span class="n"&gt;audio_part&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_uri&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gs://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;-bucket/media/audio/cymbal_direct_expert_interview.wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio/wav&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Analyze the following audio recording:
- Transcribe the conversation, identifying different speakers.
- Provide sentiment analysis (positive, negative, neutral opinions).
- Identify key themes (comfort, fit, performance, style, competitor comparisons).
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;audio_part&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# Audio first, then prompt
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note the order:&lt;/strong&gt; For audio, the &lt;code&gt;audio_part&lt;/code&gt; comes &lt;em&gt;before&lt;/em&gt; the prompt in the contents list. This is a subtle but important detail — Gemini processes the audio first, then applies the prompt instructions to it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reasoning on Audio Insights
&lt;/h3&gt;

&lt;p&gt;The deep dive extracts strategic intelligence from the conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;thinking_mode_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Analyze the audio recording in greater detail:
- Reason about overall customer satisfaction.
- Deduce key factors influencing customer perception.
- Develop three data-driven recommendations.
- Identify potential biases or limitations in the audio data.
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;thinking_model_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;audio_part&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thinking_mode_prompt&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is particularly interesting because Gemini can identify biases like interviewer framing or selection bias in who was invited to the podcast — something that requires genuine reasoning, not just transcription.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 5: Synthesizing Multimodal Insights
&lt;/h2&gt;

&lt;p&gt;The final task loads all three analysis files and asks Gemini to produce a unified report:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;comprehensive_report_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
Based on the following combined analysis of text reviews, image analysis,
and audio insights, generate a comprehensive report:
- Summarize overall sentiment across all data modalities.
- Identify key themes and trends in customer feedback.
- Provide insights on style preferences, usage patterns, and behavior.
- Evaluate how audio insights fit with product image and text feedback.
- Offer actionable recommendations for marketing strategy and positioning.

Format the report in well-structured Markdown with clear sections.

Combined Analysis Results:
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;all_analysis&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;thinking_model_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;comprehensive_report_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After generating the report, it's saved locally and uploaded to Cloud Storage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;gcloud&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt; &lt;span class="n"&gt;cp&lt;/span&gt; &lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;final_report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt; &lt;span class="n"&gt;gs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;final_report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This last step is what the grading system checks, so don't skip it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Learnings
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;One API, three modalities.&lt;/strong&gt; The &lt;code&gt;generate_content&lt;/code&gt; method handles text, images, and audio with the same interface — the only difference is how you construct the &lt;code&gt;contents&lt;/code&gt; list.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Thinking mode is a single config toggle.&lt;/strong&gt; Adding &lt;code&gt;config=config&lt;/code&gt; with &lt;code&gt;include_thoughts=True&lt;/code&gt; transforms a surface-level response into a reasoned analysis. The &lt;code&gt;-1&lt;/code&gt; thinking budget lets the model decide how deep to go based on prompt complexity.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt specificity drives output quality.&lt;/strong&gt; Vague prompts produce vague results. Each prompt in this lab explicitly lists the dimensions to analyze (sentiment, themes, audience, recommendations), and the output quality reflects that precision.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Content ordering matters for multimodal inputs.&lt;/strong&gt; For images, the prompt comes first followed by image parts. For audio, the audio part comes first. This isn't arbitrary — it affects how the model processes the input.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Chaining analyses enables synthesis.&lt;/strong&gt; By saving intermediate results to files and feeding them into a final prompt, we build a pipeline where each modality's insights compound into a richer final report.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Always ask for structured output.&lt;/strong&gt; Requesting "Markdown format with clear sections" gives you parseable, presentable results instead of a wall of text.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Use thinking mode for analysis, skip it for extraction.&lt;/strong&gt; Initial passes (transcription, item identification) don't need extended reasoning. Deep dives (inferring audience, identifying biases, generating recommendations) benefit enormously from it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Embed data directly in prompts for text; use Part objects for binary data.&lt;/strong&gt; Text data fits naturally inside f-strings. Images and audio should always go through &lt;code&gt;Part.from_bytes()&lt;/code&gt; or &lt;code&gt;Part.from_uri()&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Save intermediate results.&lt;/strong&gt; Writing each analysis to a file creates a paper trail and enables the final synthesis step without re-running expensive model calls.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't forget the upload.&lt;/strong&gt; In challenge labs, the grading system checks Cloud Storage — your analysis could be perfect, but if the file isn't in the bucket, you won't pass.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This challenge lab demonstrates a realistic workflow for multimodal AI analysis: ingest data from different sources, extract structured insights from each, apply deeper reasoning where it matters, and synthesize everything into a decision-ready report. The Gemini 2.5 Flash model on Vertex AI makes this surprisingly straightforward — the same &lt;code&gt;generate_content&lt;/code&gt; call handles text, images, and audio, and the thinking mode adds genuine analytical depth without requiring a different model or API.&lt;/p&gt;

&lt;p&gt;The patterns here — structured prompts, multimodal content lists, thinking configuration, and chained analyses — are directly applicable to real-world use cases like brand monitoring, market research, and content analysis. The hard part isn't the API calls; it's crafting prompts that extract the right insights from the right data.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googleaichallenge</category>
      <category>python</category>
      <category>googlecloud</category>
    </item>
    <item>
      <title>Solving "Use Machine Learning APIs on Google Cloud: Challenge Lab" — A Complete Guide</title>
      <dc:creator>William Schnaider Torres Bermon</dc:creator>
      <pubDate>Thu, 19 Mar 2026 01:41:08 +0000</pubDate>
      <link>https://dev.to/willtorber/solving-use-machine-learning-apis-on-google-cloud-challenge-lab-a-complete-guide-4no6</link>
      <guid>https://dev.to/willtorber/solving-use-machine-learning-apis-on-google-cloud-challenge-lab-a-complete-guide-4no6</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;This &lt;a href="https://www.skills.google/course_templates/630/labs/612231?locale=en" rel="noopener noreferrer"&gt;challenge&lt;/a&gt; lab tests your ability to build an end-to-end pipeline that extracts text from images using the &lt;strong&gt;Cloud Vision API&lt;/strong&gt;, translates it with the &lt;strong&gt;Cloud Translation API&lt;/strong&gt;, and loads the results into &lt;strong&gt;BigQuery&lt;/strong&gt;. Unlike guided labs, you're expected to fill in the blanks of a partially written Python script and configure IAM permissions yourself.&lt;/p&gt;

&lt;p&gt;Let's walk through every task with clear explanations of &lt;em&gt;why&lt;/em&gt; each step matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The pipeline works like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A Python script reads image files from a &lt;strong&gt;Cloud Storage&lt;/strong&gt; bucket&lt;/li&gt;
&lt;li&gt;Each image is sent to the &lt;strong&gt;Cloud Vision API&lt;/strong&gt; for text detection&lt;/li&gt;
&lt;li&gt;The extracted text is saved back to Cloud Storage as a &lt;code&gt;.txt&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;If the text is &lt;strong&gt;not&lt;/strong&gt; in Japanese (&lt;code&gt;locale != 'ja'&lt;/code&gt;), it's sent to the &lt;strong&gt;Translation API&lt;/strong&gt; to get a Japanese translation&lt;/li&gt;
&lt;li&gt;All results (original text, locale, translation) are uploaded to a &lt;strong&gt;BigQuery&lt;/strong&gt; table.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgn4p2j9bdb8nyx6y3zh7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgn4p2j9bdb8nyx6y3zh7.png" alt="Graphic description of the challenge" width="800" height="250"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 1: Configure a Service Account
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why a Service Account?
&lt;/h3&gt;

&lt;p&gt;The Python script needs programmatic access to Vision API, Translation API, Cloud Storage, and BigQuery. A service account acts as the script's identity, and IAM roles define what it can do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set your project ID&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Create the service account&lt;/span&gt;
gcloud iam service-accounts create my-ml-sa &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ML API Service Account"&lt;/span&gt;

&lt;span class="c"&gt;# Grant BigQuery Data Editor role (to insert rows)&lt;/span&gt;
gcloud projects add-iam-policy-binding &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:my-ml-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/bigquery.dataEditor"&lt;/span&gt;

&lt;span class="c"&gt;# Grant Cloud Storage Object Admin role (to read images and write text files)&lt;/span&gt;
gcloud projects add-iam-policy-binding &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:my-ml-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.objectAdmin"&lt;/span&gt;

&lt;span class="c"&gt;# Grant Service Usage Consumer role (required to make API calls within the project)&lt;/span&gt;
gcloud projects add-iam-policy-binding &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:my-ml-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/serviceusage.serviceUsageConsumer"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; Without &lt;code&gt;roles/serviceusage.serviceUsageConsumer&lt;/code&gt;, the service account cannot consume any enabled APIs in the project (BigQuery, Vision, Translation, etc.), even if it has data-level roles like &lt;code&gt;dataEditor&lt;/code&gt; or &lt;code&gt;storage.objectAdmin&lt;/code&gt;. This results in a &lt;code&gt;403 USER_PROJECT_DENIED&lt;/code&gt; error.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud projects get-iam-policy &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--flatten&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bindings[].members"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"bindings.members:my-ml-sa@"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see &lt;code&gt;roles/bigquery.dataEditor&lt;/code&gt;, &lt;code&gt;roles/storage.objectAdmin&lt;/code&gt;, and &lt;code&gt;roles/serviceusage.serviceUsageConsumer&lt;/code&gt; listed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 2: Create and Download Credentials
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Download a Key?
&lt;/h3&gt;

&lt;p&gt;While Cloud Shell has default credentials for the logged-in user, the challenge explicitly requires you to create a JSON key file and point the &lt;code&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/code&gt; environment variable to it. This simulates how credentials work in production environments outside GCP.&lt;/p&gt;

&lt;h3&gt;
  
  
  Commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Generate the JSON key file&lt;/span&gt;
gcloud iam service-accounts keys create ml-sa-key.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--iam-account&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-ml-sa@&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.iam.gserviceaccount.com

&lt;span class="c"&gt;# Set the environment variable so Google Cloud client libraries find the key&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PWD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/ml-sa-key.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Task 3: Modify the Script — Vision API Text Detection
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Get the Script
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gsutil &lt;span class="nb"&gt;cp &lt;/span&gt;gs://&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/analyze-images-v2.py &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  What to Modify
&lt;/h3&gt;

&lt;p&gt;The script has four sections that need your attention: three &lt;code&gt;# TBD:&lt;/code&gt; comments and one commented-out BigQuery upload line. Open the script with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nano analyze-images-v2.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;TBD #1 — Create a Vision API image object:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Find the comment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# TBD: Create a Vision API image object called image_object
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add below it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;image_object&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates an &lt;code&gt;Image&lt;/code&gt; object from the raw bytes downloaded from Cloud Storage (&lt;code&gt;file_content&lt;/code&gt;). The Vision API requires this object format to process images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TBD #2 — Call the Vision API to detect text:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Find the comment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# TBD: Detect text in the image and save the response data into an object called response
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add below it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vision_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;document_text_detection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sends the image to the Vision API's &lt;code&gt;document_text_detection&lt;/code&gt; method, which is optimized for dense text like signs. Note that the client variable is called &lt;code&gt;vision_client&lt;/code&gt; (as defined earlier in the script), and the image parameter uses the &lt;code&gt;image_object&lt;/code&gt; we just created.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test It
&lt;/h3&gt;

&lt;p&gt;Run the script after completing TBDs #1 and #2 to verify text extraction works before moving on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 analyze-images-v2.py &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see extracted text appearing in the console output.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 4: Modify the Script — Translation API
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What to Modify
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;TBD #3 — Translate non-Japanese text to Japanese:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Find the comment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# TBD: According to the target language pass the description data to the translation API
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add below it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;translation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;translate_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;translate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ja&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We use &lt;code&gt;desc&lt;/code&gt; (not a generic variable like &lt;code&gt;text&lt;/code&gt;) because that's the variable name the script assigns to the extracted description earlier: &lt;code&gt;desc = response.text_annotations[0].description&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The target language is &lt;code&gt;'ja'&lt;/code&gt; (Japanese) as specified in the lab instructions&lt;/li&gt;
&lt;li&gt;The result is stored in &lt;code&gt;translation&lt;/code&gt;, and the script already accesses &lt;code&gt;translation['translatedText']&lt;/code&gt; on the next line&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Enable the BigQuery Upload
&lt;/h2&gt;

&lt;p&gt;At the very end of the script, find the commented-out line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# errors = bq_client.insert_rows(table, rows_for_bq)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remove the &lt;code&gt;#&lt;/code&gt; to enable it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bq_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_rows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rows_for_bq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The line immediately after (&lt;code&gt;assert errors == []&lt;/code&gt;) will verify the upload succeeded.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete Modified Script Reference
&lt;/h3&gt;

&lt;p&gt;Here's a summary of all four changes in the script:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Location in Script&lt;/th&gt;
&lt;th&gt;What to Add / Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;After &lt;code&gt;# TBD: Create a Vision API image object&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;image_object = vision.Image(content=file_content)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;After &lt;code&gt;# TBD: Detect text in the image&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;response = vision_client.document_text_detection(image=image_object)&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;After &lt;code&gt;# TBD: According to the target language&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;translation = translate_client.translate(desc, target_language='ja')&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Last commented line&lt;/td&gt;
&lt;td&gt;Remove &lt;code&gt;#&lt;/code&gt; from &lt;code&gt;errors = bq_client.insert_rows(table, rows_for_bq)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Run the Complete Script
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 analyze-images-v2.py &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Watch the output — you should see text being extracted from each image, locale detection, and Japanese translations for non-Japanese text, followed by "Writing Vision API image data to BigQuery..."&lt;/p&gt;




&lt;h2&gt;
  
  
  Understanding the Python Script (&lt;code&gt;analyze-images-v2.py&lt;/code&gt;)
&lt;/h2&gt;

&lt;p&gt;Before modifying the script, it's important to understand what it does. Here's a general overview followed by a line-by-line breakdown.&lt;/p&gt;

&lt;h3&gt;
  
  
  General Overview
&lt;/h3&gt;

&lt;p&gt;The script is an automated image-processing pipeline. It connects to four Google Cloud services simultaneously: Cloud Storage (to read images and write text files), Vision API (to extract text from images via OCR), Translation API (to translate non-Japanese text into Japanese), and BigQuery (to store the final results in a queryable table).&lt;/p&gt;

&lt;p&gt;The workflow for each image is: download the image bytes from the bucket → send them to the Vision API → save the detected text back to Cloud Storage as a &lt;code&gt;.txt&lt;/code&gt; file → check the language locale → if not Japanese, translate to Japanese → collect all results → batch-upload everything to BigQuery at the end.&lt;/p&gt;

&lt;h3&gt;
  
  
  Line-by-Line Breakdown
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Dataset: image_classification_dataset
# Table name: image_text_detail
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 1-4:&lt;/strong&gt; Comments documenting the target BigQuery dataset/table. Imports &lt;code&gt;os&lt;/code&gt; (to read environment variables) and &lt;code&gt;sys&lt;/code&gt; (to read command-line arguments).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bigquery&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vision&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;translate_v2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Line 7:&lt;/strong&gt; Imports the five Google Cloud client libraries. &lt;code&gt;storage&lt;/code&gt; for Cloud Storage, &lt;code&gt;bigquery&lt;/code&gt; for BigQuery, &lt;code&gt;language&lt;/code&gt; for Natural Language API (not used in this script but imported from the original template), &lt;code&gt;vision&lt;/code&gt; for Vision API, and &lt;code&gt;translate_v2&lt;/code&gt; for the Translation API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])):&lt;/span&gt;
        &lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The GOOGLE_APPLICATION_CREDENTIALS file does not exist.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The GOOGLE_APPLICATION_CREDENTIALS environment variable is not defined.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 9-15:&lt;/strong&gt; &lt;strong&gt;Credentials check.&lt;/strong&gt; Verifies two things: (1) the &lt;code&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/code&gt; environment variable is set, and (2) the file it points to actually exists on disk. If either check fails, the script exits immediately with an error message. This is a safety gate — without valid credentials, no API call will work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;You must provide parameters for the Google Cloud project ID and Storage bucket&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;python3 &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;[PROJECT_NAME] [BUCKET_NAME]&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;exit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;project_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;bucket_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;argv&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 17-23:&lt;/strong&gt; &lt;strong&gt;Argument parsing.&lt;/strong&gt; The script requires two command-line arguments: the GCP project ID and the Cloud Storage bucket name. In this lab, both are the same value (your project ID). If you forget to pass them, the script prints usage instructions and exits.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;storage_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;bq_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bigquery&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;nl_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LanguageServiceClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 26-28:&lt;/strong&gt; &lt;strong&gt;Client initialization (part 1).&lt;/strong&gt; Creates client objects for Cloud Storage, BigQuery (bound to your project), and the Natural Language API. The &lt;code&gt;nl_client&lt;/code&gt; is inherited from the original template but not used in this challenge.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;vision_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ImageAnnotatorClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;translate_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;translate_v2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 31-32:&lt;/strong&gt; &lt;strong&gt;Client initialization (part 2).&lt;/strong&gt; Creates the Vision API client (for text detection) and the Translation API client (for translating text). These are the two ML API clients you'll use in the TBD sections.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;dataset_ref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bq_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;image_classification_dataset&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bigquery&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dataset_ref&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table_ref&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;image_text_detail&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bq_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table_ref&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 35-38:&lt;/strong&gt; &lt;strong&gt;BigQuery table setup.&lt;/strong&gt; Creates a reference chain: dataset name → dataset object → table name → table object. The &lt;code&gt;get_table()&lt;/code&gt; call actually contacts BigQuery to verify the table exists and retrieves its schema. This is where the &lt;code&gt;403 USER_PROJECT_DENIED&lt;/code&gt; error occurs if the service account lacks the &lt;code&gt;serviceUsageConsumer&lt;/code&gt; role.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rows_for_bq&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Line 41:&lt;/strong&gt; &lt;strong&gt;Results buffer.&lt;/strong&gt; Initializes an empty list that will accumulate tuples of &lt;code&gt;(description, locale, translated_text, filename)&lt;/code&gt; for each processed image. These get batch-uploaded to BigQuery at the end.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;list_blobs&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 44-45:&lt;/strong&gt; &lt;strong&gt;Bucket access.&lt;/strong&gt; &lt;code&gt;list_blobs()&lt;/code&gt; returns an iterator over every file (blob) in the bucket. The &lt;code&gt;bucket&lt;/code&gt; object is saved separately because we'll need it later to upload text files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Processing image files from GCS. This will take a few minutes..&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Line 47:&lt;/strong&gt; Status message so you know the script is working.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;jpg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt;  &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;endswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;file_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;download_as_string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 50-52:&lt;/strong&gt; &lt;strong&gt;Main loop start.&lt;/strong&gt; Iterates over every blob in the bucket, filters for image files (&lt;code&gt;.jpg&lt;/code&gt; or &lt;code&gt;.png&lt;/code&gt;), and downloads the image as raw bytes into &lt;code&gt;file_content&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# TBD: Create a Vision API image object called image_object
&lt;/span&gt;        &lt;span class="n"&gt;image_object&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# ← YOU ADD THIS
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Line 55 (TBD #1):&lt;/strong&gt; Wraps the raw image bytes into a &lt;code&gt;vision.Image&lt;/code&gt; object. The Vision API cannot accept raw bytes directly — it needs this structured object that can hold either image bytes (&lt;code&gt;content&lt;/code&gt;) or a GCS URI (&lt;code&gt;source&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# TBD: Detect text in the image and save the response data into an object called response
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vision_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;document_text_detection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# ← YOU ADD THIS
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Line 59 (TBD #2):&lt;/strong&gt; Sends the image to the Vision API's &lt;code&gt;document_text_detection&lt;/code&gt; method. This performs OCR (Optical Character Recognition) optimized for dense text. The response contains a list of &lt;code&gt;text_annotations&lt;/code&gt; — the first element holds the full concatenated text and the detected language.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="n"&gt;text_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_annotations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Line 62:&lt;/strong&gt; Extracts the full detected text from the first annotation. The &lt;code&gt;text_annotations&lt;/code&gt; array always puts the complete text in index &lt;code&gt;[0]&lt;/code&gt;, with individual word-level detections in subsequent indices.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="n"&gt;file_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.txt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="n"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text/plain&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 65-67:&lt;/strong&gt; &lt;strong&gt;Save text to Cloud Storage.&lt;/strong&gt; Converts the image filename (e.g., &lt;code&gt;sign1.jpg&lt;/code&gt;) to a text filename (&lt;code&gt;sign1.txt&lt;/code&gt;), creates a blob reference, and uploads the extracted text. This creates a text file in the same bucket for each processed image.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="n"&gt;desc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_annotations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;
        &lt;span class="n"&gt;locale&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_annotations&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;locale&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 72-73:&lt;/strong&gt; Extracts the description (full text) and locale (language code like &lt;code&gt;'en'&lt;/code&gt;, &lt;code&gt;'ja'&lt;/code&gt;, &lt;code&gt;'fr'&lt;/code&gt;) from the response. Note that &lt;code&gt;desc&lt;/code&gt; is the same value as &lt;code&gt;text_data&lt;/code&gt; — the script extracts it again for clarity of variable naming.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;locale&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;translated_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# TBD: According to the target language pass the description data to the translation API
&lt;/span&gt;            &lt;span class="n"&gt;translation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;translate_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;translate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ja&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# ← YOU ADD THIS
&lt;/span&gt;
            &lt;span class="n"&gt;translated_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;translation&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;translatedText&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 77-83 (TBD #3):&lt;/strong&gt; &lt;strong&gt;Translation logic.&lt;/strong&gt; If the locale is empty (no language detected), the original text is used as-is. Otherwise, the text is sent to the Translation API with &lt;code&gt;target_language='ja'&lt;/code&gt; (Japanese). The API returns a dictionary; the translated text is in the &lt;code&gt;'translatedText'&lt;/code&gt; key.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;translated_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Line 84:&lt;/strong&gt; Prints the translated (or original) text to the console so you can monitor progress.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text_annotations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;rows_for_bq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;locale&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;translated_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 88-89:&lt;/strong&gt; &lt;strong&gt;Collect results.&lt;/strong&gt; If the Vision API found any text (safety check), appends a tuple with the original text, locale, translated text, and filename to the results buffer. This tuple matches the BigQuery table schema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Writing Vision API image data to BigQuery...&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bq_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_rows&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rows_for_bq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;# ← YOU UNCOMMENT THIS
&lt;/span&gt;&lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lines 91-93:&lt;/strong&gt; &lt;strong&gt;BigQuery upload.&lt;/strong&gt; After all images are processed, uses &lt;code&gt;insert_rows()&lt;/code&gt; to perform a streaming insert of all collected rows into the BigQuery table. The &lt;code&gt;assert&lt;/code&gt; verifies that no errors occurred — if any row failed to insert, the script crashes with an &lt;code&gt;AssertionError&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Task 5: Validate with BigQuery
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Run the Verification Query
&lt;/h3&gt;

&lt;p&gt;Go to &lt;strong&gt;BigQuery&lt;/strong&gt; in the Console or use the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bq query &lt;span class="nt"&gt;--use_legacy_sql&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s1"&gt;'SELECT locale, COUNT(locale) as lcount FROM image_classification_dataset.image_text_detail GROUP BY locale ORDER BY lcount DESC'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a breakdown of language codes (e.g., &lt;code&gt;ja&lt;/code&gt;, &lt;code&gt;en&lt;/code&gt;, &lt;code&gt;fr&lt;/code&gt;, &lt;code&gt;de&lt;/code&gt;) with their counts. This confirms the full pipeline worked end-to-end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Reference — All Commands in Order
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# TASK 1: Create service account + bind roles&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud config get-value project&lt;span class="si"&gt;)&lt;/span&gt;

gcloud iam service-accounts create my-ml-sa &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--display-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ML API Service Account"&lt;/span&gt;

gcloud projects add-iam-policy-binding &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:my-ml-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/bigquery.dataEditor"&lt;/span&gt;

gcloud projects add-iam-policy-binding &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:my-ml-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/storage.objectAdmin"&lt;/span&gt;

gcloud projects add-iam-policy-binding &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:my-ml-sa@&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/serviceusage.serviceUsageConsumer"&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# TASK 2: Create credentials + set env var&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;
gcloud iam service-accounts keys create ml-sa-key.json &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--iam-account&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;my-ml-sa@&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;.iam.gserviceaccount.com

&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;PWD&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;/ml-sa-key.json

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# TASK 3 &amp;amp; 4: Copy and modify the script&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;
gsutil &lt;span class="nb"&gt;cp &lt;/span&gt;gs://&lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;/analyze-images-v2.py &lt;span class="nb"&gt;.&lt;/span&gt;
nano analyze-images-v2.py

&lt;span class="c"&gt;# --- Inside nano, make these 4 edits: ---&lt;/span&gt;
&lt;span class="c"&gt;# 1. After "TBD: Create a Vision API image object":&lt;/span&gt;
&lt;span class="c"&gt;#        image_object = vision.Image(content=file_content)&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# 2. After "TBD: Detect text in the image":&lt;/span&gt;
&lt;span class="c"&gt;#        response = vision_client.document_text_detection(image=image_object)&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# 3. After "TBD: According to the target language":&lt;/span&gt;
&lt;span class="c"&gt;#        translation = translate_client.translate(desc, target_language='ja')&lt;/span&gt;
&lt;span class="c"&gt;#&lt;/span&gt;
&lt;span class="c"&gt;# 4. Uncomment the last line:&lt;/span&gt;
&lt;span class="c"&gt;#        errors = bq_client.insert_rows(table, rows_for_bq)&lt;/span&gt;
&lt;span class="c"&gt;# --- Save with Ctrl+O, Enter, Ctrl+X ---&lt;/span&gt;

&lt;span class="c"&gt;# ============================================&lt;/span&gt;
&lt;span class="c"&gt;# TASK 5: Run script and validate&lt;/span&gt;
&lt;span class="c"&gt;# ============================================&lt;/span&gt;
python3 analyze-images-v2.py &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt; &lt;span class="nv"&gt;$PROJECT_ID&lt;/span&gt;

bq query &lt;span class="nt"&gt;--use_legacy_sql&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s1"&gt;'SELECT locale, COUNT(locale) as lcount FROM image_classification_dataset.image_text_detail GROUP BY locale ORDER BY lcount DESC'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;403 USER_PROJECT_DENIED&lt;/code&gt; on BigQuery or API calls&lt;/td&gt;
&lt;td&gt;Add the missing role: &lt;code&gt;gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:my-ml-sa@${PROJECT_ID}.iam.gserviceaccount.com" --role="roles/serviceusage.serviceUsageConsumer"&lt;/code&gt; — wait 1-2 min for propagation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;403 ACCESS_DENIED&lt;/code&gt; on Cloud Storage&lt;/td&gt;
&lt;td&gt;You may have used &lt;code&gt;roles/storage.admin&lt;/code&gt; instead of &lt;code&gt;roles/storage.objectAdmin&lt;/code&gt;. Fix: bind the correct role&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;PERMISSION_DENIED&lt;/code&gt; on Vision/Translate API calls&lt;/td&gt;
&lt;td&gt;Enable the APIs: &lt;code&gt;gcloud services enable vision.googleapis.com translate.googleapis.com&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;PERMISSION_DENIED&lt;/code&gt; on BigQuery&lt;/td&gt;
&lt;td&gt;Verify the &lt;code&gt;dataEditor&lt;/code&gt; role was bound correctly; wait 1-2 minutes for IAM propagation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ModuleNotFoundError&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Install packages: &lt;code&gt;pip3 install google-cloud-vision google-cloud-translate google-cloud-bigquery google-cloud-storage google-cloud-language&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credentials file error&lt;/td&gt;
&lt;td&gt;Verify: &lt;code&gt;echo $GOOGLE_APPLICATION_CREDENTIALS&lt;/code&gt; and &lt;code&gt;ls -la ml-sa-key.json&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NameError: name 'image_object' is not defined&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TBD #1 is missing — add &lt;code&gt;image_object = vision.Image(content=file_content)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NameError: name 'response' is not defined&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TBD #2 is missing — add the &lt;code&gt;vision_client.document_text_detection()&lt;/code&gt; call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;NameError: name 'translation' is not defined&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;TBD #3 is missing — add the &lt;code&gt;translate_client.translate()&lt;/code&gt; call&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Empty BigQuery table&lt;/td&gt;
&lt;td&gt;Confirm you uncommented &lt;code&gt;errors = bq_client.insert_rows(table, rows_for_bq)&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;AssertionError&lt;/code&gt; on &lt;code&gt;assert errors == []&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Check that the BigQuery table &lt;code&gt;image_text_detail&lt;/code&gt; exists in dataset &lt;code&gt;image_classification_dataset&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Script argument error&lt;/td&gt;
&lt;td&gt;Ensure you pass both arguments: &lt;code&gt;python3 analyze-images-v2.py $PROJECT_ID $PROJECT_ID&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Key Learnings
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Service accounts&lt;/strong&gt; are the standard way to provide application-level credentials in GCP. Each service account can have granular IAM roles scoped to specific services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/code&gt;&lt;/strong&gt; is the universal environment variable that all Google Cloud client libraries check for authentication.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Vision API&lt;/strong&gt; requires an &lt;code&gt;Image&lt;/code&gt; object created from raw bytes — you can't pass the bytes directly to the detection method.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Vision API's &lt;code&gt;document_text_detection&lt;/code&gt;&lt;/strong&gt; returns a structured response where the first element in &lt;code&gt;text_annotations&lt;/code&gt; contains the full detected text and its locale.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;Translation API's &lt;code&gt;translate()&lt;/code&gt; method&lt;/strong&gt; returns a dictionary with &lt;code&gt;translatedText&lt;/code&gt;, &lt;code&gt;detectedSourceLanguage&lt;/code&gt;, and &lt;code&gt;input&lt;/code&gt; keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BigQuery's &lt;code&gt;insert_rows()&lt;/code&gt;&lt;/strong&gt; performs streaming inserts and returns an empty list on success.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Always read the existing code&lt;/strong&gt; before modifying — variable names like &lt;code&gt;vision_client&lt;/code&gt;, &lt;code&gt;desc&lt;/code&gt;, and &lt;code&gt;image_object&lt;/code&gt; are defined by the script and must be used exactly as expected.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;roles/storage.objectAdmin&lt;/code&gt;&lt;/strong&gt; instead of &lt;code&gt;roles/storage.admin&lt;/code&gt; — it grants object-level read/write/delete without unnecessary bucket-level management permissions.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Principle of least privilege&lt;/strong&gt;: Only grant the roles your service account actually needs (&lt;code&gt;dataEditor&lt;/code&gt; for BigQuery writes, &lt;code&gt;storage.objectAdmin&lt;/code&gt; for GCS object access, &lt;code&gt;serviceUsageConsumer&lt;/code&gt; for API consumption).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test incrementally&lt;/strong&gt;: Run the script after each modification to catch errors early rather than debugging everything at once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment variables for credentials&lt;/strong&gt;: Never hard-code paths to credential files in your scripts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read the existing code carefully&lt;/strong&gt;: Variable names matter — using &lt;code&gt;vision_client&lt;/code&gt; vs &lt;code&gt;client&lt;/code&gt; or &lt;code&gt;desc&lt;/code&gt; vs &lt;code&gt;text&lt;/code&gt; can cause &lt;code&gt;NameError&lt;/code&gt; exceptions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;document_text_detection&lt;/code&gt; over &lt;code&gt;text_detection&lt;/code&gt;&lt;/strong&gt; when dealing with dense text in images — it uses a more advanced OCR model.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This challenge lab walks you through a realistic ML pipeline pattern: ingest raw data (images), enrich it using ML APIs (Vision + Translation), and store structured results for analysis (BigQuery). These same building blocks — Cloud Storage for data lake, ML APIs for enrichment, BigQuery for analytics — appear in production architectures across industries. Mastering this flow gives you a solid foundation for building more complex ML data pipelines on Google Cloud.&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>googleaichallenge</category>
    </item>
  </channel>
</rss>
