Building GeminiLens: An Interactive Educational Explainer with Google Gemini and Cloud Run

#python #gemini #hackathon #geminiliveagentchallenge

Hi everyone!

I’m excited to share the technical journey behind GeminiLens, my entry for the #GeminiLiveAgentChallenge. GeminiLens is an adaptive AI teacher that doesn't just talk at you—it explains complex concepts using text, dynamically generated diagrams (Imagen), and even generated videos (Veo).

(Disclaimer: I created this piece of content for the purposes of entering this hackathon.)

The Problem: Textbooks Are Static

Learning a new concept often requires more than just reading. You need visualization, summarization, and interactivity. I wanted to build an agent that acts like a human academic mentor—someone who knows when to draw a diagram on the whiteboard or when to switch to a video explanation.

The Solution: A Multi-Modal Agent with Google GenAI

The core of GeminiLens is built using the google-genai SDK. I leveraged Gemini's function calling capabilities (Tools) to give the model "hands" to create content.

1. Defining the Persona

The "brain" of the application is a persistent chat session. In main.py, I defined a strict system instruction to ensure the model behaves like a mentor and uses its visual tools proactively:

system_instruction = (
    "You are the GeminiLens Academic Mentor. "
    "Explain complex concepts clearly, utilizing text, and whenever helpful, generate educational diagrams to illustrate your points. "
    "Use the `generate_educational_diagram` tool to create visuals. "
    "When a user asks to summarize a lesson or create a deck, use the `create_presentation_deck` tool. Map complex concepts to slides. "
    "Use previously generated Imagen diagrams or Veo video URLs in the media_url field to make the slides visual. "
    "CRITICAL: You MUST ALWAYS include a detailed textual explanation in your responses. Never return only an image or diagram without accompanying text. "
    "Return your final explanation in Markdown format..."
)

2. Wiring Up the Tools

GeminiLens isn't limited to text. I registered Python functions as tools that the model can invoke. For example, here is how I integrated Imagen to generate educational diagrams on the fly:

def generate_educational_diagram(prompt: str) -> str:
    """
    Generates an educational diagram or image based on the given prompt using Google's Imagen model.
    Call this tool when you need to visually explain a concept to the user.
    """
    print(f"DEBUG: Generating image for prompt: {prompt}")
    try:
        # Call Imagen model
        result = client.models.generate_images(
            model=settings.IMAGEN_MODEL_ID,
            prompt=prompt,
            config=types.GenerateImagesConfig(
                number_of_images=1, aspect_ratio="16:9", person_generation="DONT_ALLOW"
            ),
        )

        # ... (saving logic) ...

        return f"/static/images/{image_filename}"
    except Exception as e:
        return f"Error generating image: {str(e)}"

I then initialized the chat session with these tools attached, allowing Gemini to decide when to draw and when to speak:

# Global chat session to keep history across requests
global_chat = client.chats.create(
    model=settings.MAIN_MODEL_ID,
    config=types.GenerateContentConfig(
        system_instruction=system_instruction,
        tools=[tool_generate_diagram, tool_create_presentation],
        temperature=0.7,
    ),
)

The Infrastructure: Fast and Scalable with Cloud Run

To host the API, I chose FastAPI running on Google Cloud Run. Cloud Run is perfect for this because it handles the containerization complexity and scales automatically.

The application serves the frontend via Jinja2 templates and exposes endpoints like /api/explain (for the main chat) and /api/generate_video (which triggers the Veo model).

Automated Deployment (Infrastructure-as-Code)

A critical part of modern cloud engineering is automation. Instead of manually clicking through the Google Cloud Console, I wrote a shell script to automate the build and deploy process.

My deploy.sh script handles everything from building the container image with Cloud Build to deploying it to Cloud Run with the necessary environment variables.

Here is the actual script used to deploy GeminiLens:

#!/bin/bash
set -e

# Configuration
PROJECT_ID="gemini-lens-hackathon"
REGION="us-central1"
APP_NAME="gemini-lens"
IMAGE_NAME="gcr.io/$PROJECT_ID/$APP_NAME"

echo "Deploying GeminiLens Interactive Educational Explainer..."

# ... (gcloud checks and config set) ...

echo "Building and submitting Docker image via Cloud Build..."
gcloud builds submit --tag "$IMAGE_NAME"

echo "Deploying to Cloud Run..."
read -p "Enter your GOOGLE_API_KEY to inject into the deployment: " API_KEY

gcloud run deploy "$APP_NAME" \
  --image "$IMAGE_NAME" \
  --region "$REGION" \
  --allow-unauthenticated \
  --set-env-vars="GOOGLE_API_KEY=$API_KEY"

echo "Deployment complete! Visit the URL provided by Cloud Run above."

This script ensures that every deployment is consistent. It creates a container image stored in the Google Container Registry and then spins up a fresh Cloud Run revision.

What I Learned

Building GeminiLens taught me the power of combining models. Using Gemini for reasoning and conversation, while offloading visual tasks to Imagen and Veo, creates a much richer user experience than a standard text chatbot.

Check out the full source code and try deploying it yourself here: Github