Building AI-Powered Image Apps with Gemini and Imagen on Vertex AI

#vertexai #aiapps #genai #gemini

Recently, I had the opportunity to explore a series of hands-on labs using Google’s Vertex AI, specifically focusing on generative AI capabilities. Over a few short sessions, I built and tested applications that could analyze images, generate visuals from text, and simulate human-like chat — all using pre-trained models like Gemini and Imagen

Generative AI on Vertex AI (also known as genAI or gen AI) gives you access to Google’s large generative AI models so you can test, tune, and deploy them for use in your AI-powered applications.

In this lab, you will:

Connect to Vertex AI: Learn how to establish a connection to Google Cloud’s AI platform using the Vertex AI SDK.
Work with Pre-trained Models: Use powerful, pre-trained generative models like Gemini for text and image understanding, and Image Generation Model for creating visuals from text prompts.
Send Inputs to the Models: Provide text or a combination of image + text as input for analysis or generation.
Interpret Model Outputs: Extract either AI-generated images or text-based answers, depending on the model used.
Explore AI Application Basics: Understand key concepts involved in integrating AI capabilities into real-world software projects.

These labs were quick (about 15–20 minutes each), but they provided a solid foundation for working with multi-modal AI systems. Here’s a breakdown of what I learned and built along the way.

1. Image Understanding with Gemini
In the first lab, I connected to Vertex AI using the Python SDK and worked with Gemini, a pre-trained multi-modal model that can understand both images and text. The task was to feed the model an image and ask it to describe what’s in the picture.

Here’s what the code looked like in essence:

from google import genai
from google.genai.types import HttpOptions, Part
client = genai.Client(http_options=HttpOptions(api_version="v1"))
response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents=[
        "What is shown in this image?",
        Part.from_uri(
            file_uri="gs://cloud-samples-data/generative-ai/image/scones.jpg",
            mime_type="image/jpeg",
        ),
    ],
)
print(response.text)

To set the environment variables in the new terminal, run the following command:

export GOOGLE_CLOUD_PROJECT='"project-id"' 
export GOOGLE_CLOUD_LOCATION='"REGION"' 
export GOOGLE_GENAI_USE_VERTEXAI=True

execute:

/usr/bin/python3 /genai.py

This snippet loads the Gemini model (gemini-2.0-flash-001) and uses the generate_content() method to analyze an image based on a prompt. It combines image and text input, and the model returns a descriptive response—showcasing its ability to understand both formats together. It was fascinating to see how accurately the model interpreted the image, with clear applications in visual search, inventory tagging, or accessibility tools.

2. Generating Images with Imagen
Next up was Imagen, another model available on Vertex AI. This one focuses on text-to-image generation. I created a Python function that takes a text prompt (e.g., “a cricket ground in the heart of Los Angeles”) and generates a high-quality image based on it.

Here’s a sample of the function I used:

import argparse
import vertexai
from vertexai.preview.vision_models import ImageGenerationModel


def generate_image(
    project_id: str, location: str, output_file: str, prompt: str
) -> vertexai.preview.vision_models.ImageGenerationResponse:
    """Generate an image using a text prompt.
    Args:
      project_id: Google Cloud project ID, used to initialize Vertex AI.
      location: Google Cloud region, used to initialize Vertex AI.
      output_file: Local path to the output image file.
      prompt: The text prompt describing what you want to see."""
    vertexai.init(project=project_id, location=location)
    model = ImageGenerationModel.from_pretrained("imagen-3.0-generate-002")
    images = model.generate_images(
        prompt=prompt,
        # Optional parameters
        number_of_images=1,
        seed=1,
        add_watermark=False,
    )
    images[0].save(location=output_file)
    return images
generate_image(
    project_id='"project-id"',
    location='"REGION"',
    output_file='image.jpeg',
    prompt='Create an image of a cricket ground in the heart of Los Angeles',
    )
What impressed me most was how coherent and realistic the generated images were. This capability can be useful in creative domains — think marketing, e-commerce mockups, or game design.

3. Building Chat Apps with Gemini
Another lab focused on simulating human-like chat interactions using Gemini. I tried both non-streaming and streaming modes of sending and receiving messages. Here’s how it worked:

Without streaming: You get a complete response once the model finishes generating.
With streaming: The response is delivered in real-time chunks, giving it a more conversational feel — much like how we interact with real humans.
Here’s a quick snippet using streaming:


import logging
from google.cloud import logging as gcp_logging

gcp_logging_client = gcp_logging.Client()
gcp_logging_client.setup_logging()
client = genai.Client(
    vertexai=True,
    project='"project-id"',
    location='"REGION"',
    http_options=HttpOptions(api_version="v1")
)
chat = client.chats.create(model="gemini-2.0-flash-001")
response_text = ""
for chunk in chat.send_message_stream("What are all the colors in a rainbow?"):
    print(chunk.text, end="")
    response_text += chunk.text

This could easily serve as the backend for a customer support bot, tutoring assistant, or a voice-enabled interface.

4. Putting It All Together in a Challenge Lab
The final lab combined everything into a mini-project. I built a multi-modal application where users could describe a bouquet (“2 sunflowers and 3 roses”), generate the image using Imagen, and then analyze that image using Gemini to generate a custom birthday message.

The workflow looked like this:

generate_bouquet_image(prompt) – Uses Imagen to create and save an image.
analyze_bouquet_image(image_path) – Uses Gemini to interpret the image and generate text output (with streaming enabled).
This kind of multi-modal application opens up many possibilities, from personalized content creation to automated catalog generation.

Final Thoughts
These labs were concise, beginner-friendly, and practical. More importantly, they demonstrated how easily powerful AI models can be integrated into real applications using just a few lines of Python and Vertex AI’s tools.

What stood out to me the most was how:

I didn’t need to train any model from scratch.
The SDK made deployment and integration straightforward.
Switching between tasks (image gen, analysis, chat) was seamless.
Whether you’re exploring AI for personal projects or professional applications, tools like Vertex AI are making it more accessible than ever.

Thanks for reading! If you’ve also worked with generative AI tools or are curious about Vertex AI, feel free to drop a comment or share your thoughts.

✍️ Written by Shraddha Shetty — Senior Data Engineer, AI Enthusiast and MLOps Practitioner.

Connect with me: LinkedIn, Medium, Github