Build a serverless AI App with Cloud run, Python, Gemini and Vertex AI

This is part 1 in a series!

Show me the code:

https://github.com/dllewellyn/hello-gemini

Introduction

In this series of tutorials, we’re going to use a suite of google tools — AI, cloud and dev — in order to create and deploy an AI powered application. We’ll steer clear on chatbots, because they are a tedious in the extreme and focus on building something more interesting.

Creating a “Hello World” Flask Application with Google IDX and the Gemini API

To get started, we’re going to use the ‘hello world’ IDX — Gemini and flask application. This gives us a really quick way to get setup with some AI tooling.

Project Setup

Navigate to Google IDX: Begin by accessing the Google IDX platform in your web browser.
Select the Gemini API Template: From the IDX welcome screen, locate and click on the “Gemini API” template under the “Start something new with a template” section.

Configure New Workspace: A “New Workspace” window will appear.
Name your workspace: I’ve just called it “hello-gemini.”
Environment: Choose the “Python Web App (Flask)” option from the dropdown menu.

Create Workspace: Once configured, click the “Create” button to initialise the workspace creation.
Await Setup Completion: IDX will set up the necessary environment for your Flask application.

With your workspace ready, we’ve got a basic ‘gemini’ application

Looking through the Hello World application

Obtain a Gemini API Key

Before you begin, you’ll need an API key to access the Gemini API.

Visit the Google Cloud Console.
Navigate to the ‘API Keys’ section.
Click ‘Create API Key’.
Choose an existing Google Cloud project or create a new one.
Copy the generated API key. Remember to store your API key securely!

Set up the Flask Application

Check the existing app in main.py — update this with your actual API key.

This is the basic setup of the gemini API

import os
import json
from google.generativeai import genai
from flask import Flask, render_template, request, jsonify

# Replace 'YOUR_API_KEY' with your actual API key
API_KEY = 'YOUR_API_KEY'
genai.configure(api_key=API_KEY)
app = Flask(__name__)

Check theHTML Template

Take a look in index.htmlto serve as the front-end for your web application. This template will display images of baked goods, an input field for the prompt, and a results section.

<!DOCTYPE html>
<html>
<head>
    <title>Baking with the Gemini API</title>
</head>
<body>
    <h1>Baking with the Gemini API</h1>
    <div>
        <img src="images/baked-good-1.jpg" alt="Baked Good 1">
        <img src="images/baked-good-2.jpg" alt="Baked Good 2">
        <img src="images/baked-good-3.jpg" alt="Baked Good 3">
    </div>
    <div>
        <label for="prompt">Provide an example recipe for the baked goods in:</label>
        <input type="text" id="prompt" name="prompt">
        <button onclick="generateRecipe()">Go</button>
    </div>
    <div id="results">
        <h2>Results will appear here</h2>
    </div>
</body>
</html>

You’ll see three images on the screen and a prompt input.

Define Flask Routes and Functions

In your main.py file, define the routes and functions to handle requests from the front-end and interact with the Gemini API.

@app.route('/')
def index():
  return render_template('index.html')

@app.route("/api/generate", methods=["POST"])
def generate_api():
    if request.method == "POST":
        try:
            req_body = request.get_json()
            content = req_body.get("contents")
            model = genai.GenerativeModel(model_name=req_body.get("model"))
            response = model.generate_content(content, stream=True)
            def stream():
                for chunk in response:
                    yield 'data: %s\n\n' % json.dumps({ "text": chunk.text })

            return stream(), {'Content-Type': 'text/event-stream'}

        except Exception as e:
            return jsonify({ "error": str(e) })

In this block of code, you’ll notice that we take the ‘model’ from the input, and the ‘contents’ which have been fired to the API from javascript.

We ‘stream’ the response — and that means we can pass that streamed content back to the frontend so they’re not waiting for everything to finish before showing it to the user.

Run the Flask Application

If you’re inside IDX, you can just view it in the ‘preview’ window

Now, you should be able to access your web application in your browser, select an image of a baked good, enter a prompt, and generate a baking recipe using the Gemini API. If you’re running inside of IDX, you can just run the ‘web preview’ — click Cmd+Shift+P and enter ‘Web preview’ (if on a mac) and you’ll see the preview window.

Hit generate and you’ll see a recipe based on the image you’ve selected:

Analysing the javascript

If you look through main.js you’ll see this file:

import { streamGemini } from './gemini-api.js';

let form = document.querySelector('form');
let promptInput = document.querySelector('input[name="prompt"]');
let output = document.querySelector('.output');

form.onsubmit = async (ev) => {
  ev.preventDefault();
  output.textContent = 'Generating...';

  try {
    // Load the image as a base64 string
    let imageUrl = form.elements.namedItem('chosen-image').value;
    let imageBase64 = await fetch(imageUrl)
      .then(r => r.arrayBuffer())
      .then(a => base64js.fromByteArray(new Uint8Array(a)));

    // Assemble the prompt by combining the text with the chosen image
    let contents = [
      {
        role: 'user',
        parts: [
          { inline_data: { mime_type: 'image/jpeg', data: imageBase64, } },
          { text: promptInput.value }
        ]
      }
    ];

    // Call the multimodal model, and get a stream of results
    let stream = streamGemini({
      model: 'gemini-1.5-flash', // or gemini-1.5-pro
      contents,
    });

    // Read from the stream and interpret the output as markdown
    let buffer = [];
    let md = new markdownit();
    for await (let chunk of stream) {
      buffer.push(chunk);
      output.innerHTML = md.render(buffer.join(''));
    }
  } catch (e) {
    output.innerHTML += '<hr>' + e;
  }
};

In it you can see that when the form is submit, it passes the model and the ‘contents’ — this contents object is what gemini expects and in python it mostly does a pass through.

content = req_body.get("contents")
model = genai.GenerativeModel(model_name=req_body.get("model"))
response = model.generate_content(content, stream=True)

It’s quite a straightforward setup — we base64 encode the image, upload it along with the prompt to gemini and then stream the response.

You can also see the stream response is coming from the gemini-api.js file:

/**
 * Calls the given Gemini model with the given image and/or text
 * parts, streaming output (as a generator function).
 */
export async function* streamGemini({
  model = 'gemini-1.5-flash', // or gemini-1.5-pro
  contents = [],
} = {}) {
  // Send the prompt to the Python backend
  // Call API defined in main.py
  let response = await fetch("/api/generate", {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({ model, contents })
  });

  yield* streamResponseChunks(response);
}

/**
 * A helper that streams text output chunks from a fetch() response.
 */
async function* streamResponseChunks(response) {
  let buffer = '';

  const CHUNK_SEPARATOR = '\n\n';

  let processBuffer = async function* (streamDone = false) {
    while (true) {
      let flush = false;
      let chunkSeparatorIndex = buffer.indexOf(CHUNK_SEPARATOR);
      if (streamDone && chunkSeparatorIndex < 0) {
        flush = true;
        chunkSeparatorIndex = buffer.length;
      }
      if (chunkSeparatorIndex < 0) {
        break;
      }

      let chunk = buffer.substring(0, chunkSeparatorIndex);
      buffer = buffer.substring(chunkSeparatorIndex + CHUNK_SEPARATOR.length);
      chunk = chunk.replace(/^data:\s*/, '').trim();
      if (!chunk) {
        if (flush) break;
        continue;
      }
      let { error, text } = JSON.parse(chunk);
      if (error) {
        console.error(error);
        throw new Error(error?.message || JSON.stringify(error));
      }
      yield text;
      if (flush) break;
    }
  };

  const reader = response.body.getReader();
  try {
    while (true) {
      const { done, value } = await reader.read()
      if (done) break;
      buffer += new TextDecoder().decode(value);
      console.log(new TextDecoder().decode(value));
      yield* processBuffer();
    }
  } finally {
    reader.releaseLock();
  }

  yield* processBuffer(true);
}

In this file, the response is streamed from the server and rendered into markdown as the data comes back to the frontend.

That’s a wrap

We’ve covered a fair bit in this hello-world application, explaining how to get gemini setup in a python and flask app inside IDX, how to stream the response and how to use images as part of it!