DEV Community

dllewellyn
dllewellyn

Posted on • Edited on

Build a serverless AI App with Cloud run, Python, Gemini and Vertex AI

In this series of tutorials, we’re going to use a suite of google tools - AI, cloud and dev - in order to create and deploy an AI powered application. We’ll steer clear on chatbots, because they are a tedious in the extreme and focus on building something more interesting.

Creating a "Hello World" Flask Application with Google IDX and the Gemini API

To get started, we're going to use the 'hello world' IDX - Gemini and flask application. This gives us a really quick way to get setup with some AI.

Project Setup

  1. Navigate to Google IDX: Begin by accessing the Google IDX platform in your web browser.
  2. Select the Gemini API Template: From the IDX welcome screen, locate and click on the "Gemini API" template under the "Start something new with a template" section.

    select-gemini-api.png

  3. Configure New Workspace: A "New Workspace" window will appear. Here, customise the following settings:

    • Name your workspace: Provide a descriptive name for your project, such as "hello-gemini."
    • Environment: Choose the "Python Web App (Flask)" option from the dropdown menu.

configure-workspace.png

  1. Create Workspace: Once configured, click the "Create" button to initialise the workspace creation.
  2. Await Setup Completion: IDX will set up the necessary environment for your Flask application. This process might take a few moments, and a progress bar will indicate the status.

Next Steps

With your workspace ready, you can proceed to develop your Flask application within the provided environment.

This simple setup using Google IDX provides a foundation for building and deploying Flask applications that interact with the Gemini API.

Getting started with IDX

Go to https://idx.google.com/ - you'll need a google account to get started, but you'll also need it for the rest of the gemini/ vertex stuff too.

Looking through the Hello World application

Obtain a Gemini API Key

Before you begin, you'll need an API key to access the Gemini API.

  1. Visit the Google Cloud Console.
  2. Navigate to the 'API Keys' section.
  3. Click 'Create API Key'.
  4. Choose an existing Google Cloud project or create a new one.
  5. Copy the generated API key. Remember to store your API key securely!

    creating-gemini-api-key.png

Set up the Flask Application

Create a Python file (e.g., main.py) and install the necessary libraries:

import os
import json
from google.generativeai import genai
from flask import Flask, render_template, request, jsonify

# Replace 'YOUR_API_KEY' with your actual API key
API_KEY = 'YOUR_API_KEY'
genai.configure(api_key=API_KEY)
app = Flask(__name__)

# ... (Rest of the code will be added in the following steps)

Enter fullscreen mode Exit fullscreen mode

Create the HTML Template

Create an HTML file (e.g., index.html) to serve as the front-end for your web application. This template will display images of baked goods, an input field for the prompt, and a results section.

<!DOCTYPE html>
<html>
<head>
    <title>Baking with the Gemini API</title>
</head>
<body>
    <h1>Baking with the Gemini API</h1>
    <div>
        <img src="images/baked-good-1.jpg" alt="Baked Good 1">
        <img src="images/baked-good-2.jpg" alt="Baked Good 2">
        <img src="images/baked-good-3.jpg" alt="Baked Good 3">
    </div>
    <div>
        <label for="prompt">Provide an example recipe for the baked goods in:</label>
        <input type="text" id="prompt" name="prompt">
        <button onclick="generateRecipe()">Go</button>
    </div>
    <div id="results">
        <h2>Results will appear here</h2>
    </div>

    <script>
        function generateRecipe() {
            // ... (JavaScript code to handle user input and send requests to the Flask app)
        }
    </script>
</body>
</html>

Enter fullscreen mode Exit fullscreen mode

Define Flask Routes and Functions

In your main.py file, define the routes and functions to handle requests from the front-end and interact with the Gemini API.

Build a serverless AI App with Cloud run, Python, Gemini and Vertex AI

This is part 1 in a series!

Show me the code:

https://github.com/dllewellyn/hello-gemini

Introduction

In this series of tutorials, we’re going to use a suite of google tools — AI, cloud and dev — in order to create and deploy an AI powered application. We’ll steer clear on chatbots, because they are a tedious in the extreme and focus on building something more interesting.

Creating a “Hello World” Flask Application with Google IDX and the Gemini API

To get started, we’re going to use the ‘hello world’ IDX — Gemini and flask application. This gives us a really quick way to get setup with some AI tooling.

Project Setup

  1. Navigate to Google IDX: Begin by accessing the Google IDX platform in your web browser.
  2. Select the Gemini API Template: From the IDX welcome screen, locate and click on the “Gemini API” template under the “Start something new with a template” section.

!https://cdn-images-1.medium.com/max/800/1*3UbWMkF7ijeiRDG11z7KDw.png

  1. Configure New Workspace: A “New Workspace” window will appear.
  2. Name your workspace: I’ve just called it “hello-gemini.”
  3. Environment: Choose the “Python Web App (Flask)” option from the dropdown menu.

!https://cdn-images-1.medium.com/max/800/1*s9j9Z4H7RirlczOCKohNiw.png

  1. Create Workspace: Once configured, click the “Create” button to initialise the workspace creation.
  2. Await Setup Completion: IDX will set up the necessary environment for your Flask application.

With your workspace ready, we’ve got a basic ‘gemini’ application

Looking through the Hello World application

Obtain a Gemini API Key

!https://cdn-images-1.medium.com/max/800/1*c6tBFPL_FQN9mSHMu84AiQ.png

Before you begin, you’ll need an API key to access the Gemini API.

  1. Visit the Google Cloud Console.
  2. Navigate to the ‘API Keys’ section.
  3. Click ‘Create API Key’.
  4. Choose an existing Google Cloud project or create a new one.
  5. Copy the generated API key. Remember to store your API key securely!

Set up the Flask Application

Check the existing app in main.py — update this with your actual API key.

This is the basic setup of the gemini API

import os
import json
from google.generativeai import genai
from flask import Flask, render_template, request, jsonify

# Replace 'YOUR_API_KEY' with your actual API key
API_KEY = 'YOUR_API_KEY'
genai.configure(api_key=API_KEY)
app = Flask(__name__)
Enter fullscreen mode Exit fullscreen mode

Check theHTML Template

Take a look in index.htmlto serve as the front-end for your web application. This template will display images of baked goods, an input field for the prompt, and a results section.

<!DOCTYPE html>
<html>
<head>
    <title>Baking with the Gemini API</title>
</head>
<body>
    <h1>Baking with the Gemini API</h1>
    <div>
        <img src="images/baked-good-1.jpg" alt="Baked Good 1">
        <img src="images/baked-good-2.jpg" alt="Baked Good 2">
        <img src="images/baked-good-3.jpg" alt="Baked Good 3">
    </div>
    <div>
        <label for="prompt">Provide an example recipe for the baked goods in:</label>
        <input type="text" id="prompt" name="prompt">
        <button onclick="generateRecipe()">Go</button>
    </div>
    <div id="results">
        <h2>Results will appear here</h2>
    </div>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

You’ll see three images on the screen and a prompt input.

Define Flask Routes and Functions

In your main.py file, define the routes and functions to handle requests from the front-end and interact with the Gemini API.

@app.route('/')
def index():
  return render_template('index.html')

@app.route("/api/generate", methods=["POST"])
def generate_api():
    if request.method == "POST":
        try:
            req_body = request.get_json()
            content = req_body.get("contents")
            model = genai.GenerativeModel(model_name=req_body.get("model"))
            response = model.generate_content(content, stream=True)
            def stream():
                for chunk in response:
                    yield 'data: %s\n\n' % json.dumps({ "text": chunk.text })

            return stream(), {'Content-Type': 'text/event-stream'}

        except Exception as e:
            return jsonify({ "error": str(e) })
Enter fullscreen mode Exit fullscreen mode

In this block of code, you’ll notice that we take the ‘model’ from the input, and the ‘contents’ which have been fired to the API from javascript.

We ‘stream’ the response — and that means we can pass that streamed content back to the frontend so they’re not waiting for everything to finish before showing it to the user.

Run the Flask Application

If you’re inside IDX, you can just view it in the ‘preview’ window

!https://cdn-images-1.medium.com/max/800/1*Xl8qXDvkkDJ9PoozstwupA.png

Now, you should be able to access your web application in your browser, select an image of a baked good, enter a prompt, and generate a baking recipe using the Gemini API. If you’re running inside of IDX, you can just run the ‘web preview’ — click Cmd+Shift+P and enter ‘Web preview’ (if on a mac) and you’ll see the preview window.

Hit generate and you’ll see a recipe based on the image you’ve selected:

!https://cdn-images-1.medium.com/max/800/1*9hBTJhjNHWv4lmZ-yFvMAw.png

Analysing the javascript

If you look through main.js you’ll see this file:

import { streamGemini } from './gemini-api.js';

let form = document.querySelector('form');
let promptInput = document.querySelector('input[name="prompt"]');
let output = document.querySelector('.output');

form.onsubmit = async (ev) => {
  ev.preventDefault();
  output.textContent = 'Generating...';

  try {
    // Load the image as a base64 string
    let imageUrl = form.elements.namedItem('chosen-image').value;
    let imageBase64 = await fetch(imageUrl)
      .then(r => r.arrayBuffer())
      .then(a => base64js.fromByteArray(new Uint8Array(a)));

    // Assemble the prompt by combining the text with the chosen image
    let contents = [
      {
        role: 'user',
        parts: [
          { inline_data: { mime_type: 'image/jpeg', data: imageBase64, } },
          { text: promptInput.value }
        ]
      }
    ];

    // Call the multimodal model, and get a stream of results
    let stream = streamGemini({
      model: 'gemini-1.5-flash', // or gemini-1.5-pro
      contents,
    });

    // Read from the stream and interpret the output as markdown
    let buffer = [];
    let md = new markdownit();
    for await (let chunk of stream) {
      buffer.push(chunk);
      output.innerHTML = md.render(buffer.join(''));
    }
  } catch (e) {
    output.innerHTML += '<hr>' + e;
  }
};
Enter fullscreen mode Exit fullscreen mode

In it you can see that when the form is submit, it passes the model and the ‘contents’ — this contents object is what gemini expects and in python it mostly does a pass through.

content = req_body.get("contents")
model = genai.GenerativeModel(model_name=req_body.get("model"))
response = model.generate_content(content, stream=True)
Enter fullscreen mode Exit fullscreen mode

It’s quite a straightforward setup — we base64 encode the image, upload it along with the prompt to gemini and then stream the response.

You can also see the stream response is coming from the gemini-api.js file:

/**
 * Calls the given Gemini model with the given image and/or text
 * parts, streaming output (as a generator function).
 */
export async function* streamGemini({
  model = 'gemini-1.5-flash', // or gemini-1.5-pro
  contents = [],
} = {}) {
  // Send the prompt to the Python backend
  // Call API defined in main.py
  let response = await fetch("/api/generate", {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({ model, contents })
  });

  yield* streamResponseChunks(response);
}

/**
 * A helper that streams text output chunks from a fetch() response.
 */
async function* streamResponseChunks(response) {
  let buffer = '';

  const CHUNK_SEPARATOR = '\n\n';

  let processBuffer = async function* (streamDone = false) {
    while (true) {
      let flush = false;
      let chunkSeparatorIndex = buffer.indexOf(CHUNK_SEPARATOR);
      if (streamDone && chunkSeparatorIndex < 0) {
        flush = true;
        chunkSeparatorIndex = buffer.length;
      }
      if (chunkSeparatorIndex < 0) {
        break;
      }

      let chunk = buffer.substring(0, chunkSeparatorIndex);
      buffer = buffer.substring(chunkSeparatorIndex + CHUNK_SEPARATOR.length);
      chunk = chunk.replace(/^data:\s*/, '').trim();
      if (!chunk) {
        if (flush) break;
        continue;
      }
      let { error, text } = JSON.parse(chunk);
      if (error) {
        console.error(error);
        throw new Error(error?.message || JSON.stringify(error));
      }
      yield text;
      if (flush) break;
    }
  };

  const reader = response.body.getReader();
  try {
    while (true) {
      const { done, value } = await reader.read()
      if (done) break;
      buffer += new TextDecoder().decode(value);
      console.log(new TextDecoder().decode(value));
      yield* processBuffer();
    }
  } finally {
    reader.releaseLock();
  }

  yield* processBuffer(true);
}
Enter fullscreen mode Exit fullscreen mode

In this file, the response is streamed from the server and rendered into markdown as the data comes back to the frontend.

That’s a wrap

We’ve covered a fair bit in this hello-world application, explaining how to get gemini setup in a python and flask app inside IDX, how to stream the response and how to use images as part of it!

@app.route('/')
def index():
    return render_template('index.html')

@app.route('/generate', methods=['POST'])
def generate_recipe():
    if request.method == 'POST':
        selected_image = request.form['selected_image']
        prompt = request.form['prompt']

        # ... (Code to process the image and prompt, and send a request to the Gemini API)

        response = genai.generate_text(
            model='models/gemini-1.5-pro-vision',
            prompt=prompt,
            image=image_data,
            stream_output=True
        )

        # ... (Code to stream the response back to the front-end)

        return jsonify({'status': 'success', 'response': streamed_response})
        ```
{% endraw %}


### Handle User Input and Display Results

In your {% raw %}`index.html`{% endraw %} file, add JavaScript code to handle user input, send requests to the Flask app, and display the streamed response from the Gemini API.
{% raw %}


```javascript
// ... (Previous code from step 3)

function generateRecipe() {
    const selectedImage = // ... (Get the URL of the selected image)
    const prompt = document.getElementById('prompt').value;

    fetch('/generate', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/x-www-form-urlencoded'
        },
        body: `selected_image=${selectedImage}&prompt=${prompt}`
    })
    .then(response => response.json())
    .then(data => {
        if (data.status === 'success') {
            const resultsDiv = document.getElementById('results');
            resultsDiv.innerHTML = ''; // Clear previous results

            // ... (Code to display the streamed response in the resultsDiv)
        } else {
            // ... (Handle errors)
        }
    })
    .catch(error => {
        // ... (Handle errors)
    });
}

Enter fullscreen mode Exit fullscreen mode

Run the Flask Application

Run your Flask app using the following command:

#!/bin/sh

python -m flask --app main run -p $PORT --debug

Enter fullscreen mode Exit fullscreen mode

Now, you should be able to access your web application in your browser, select an image of a baked good, enter a prompt, and generate a baking recipe using the Gemini API. If you're running inside of IDX, you can just run the 'web preview' - click Cmd+Shift+P and enter 'Web preview' (if on a mac) and you'll see the preview window.

A wrap for now

That’s a wrap for now, in the next post we’ll talk through changing this to use vertex AI instead

Build a serverless AI App with Cloud run, Python, Gemini and Vertex AI

This is part 1 in a series!

Show me the code:

https://github.com/dllewellyn/hello-gemini

Introduction

In this series of tutorials, we’re going to use a suite of google tools — AI, cloud and dev — in order to create and deploy an AI powered application. We’ll steer clear on chatbots, because they are a tedious in the extreme and focus on building something more interesting.

Creating a “Hello World” Flask Application with Google IDX and the Gemini API

To get started, we’re going to use the ‘hello world’ IDX — Gemini and flask application. This gives us a really quick way to get setup with some AI tooling.

Project Setup

  1. Navigate to Google IDX: Begin by accessing the Google IDX platform in your web browser.
  2. Select the Gemini API Template: From the IDX welcome screen, locate and click on the “Gemini API” template under the “Start something new with a template” section.

!https://cdn-images-1.medium.com/max/800/1*3UbWMkF7ijeiRDG11z7KDw.png

  1. Configure New Workspace: A “New Workspace” window will appear.
  2. Name your workspace: I’ve just called it “hello-gemini.”
  3. Environment: Choose the “Python Web App (Flask)” option from the dropdown menu.

!https://cdn-images-1.medium.com/max/800/1*s9j9Z4H7RirlczOCKohNiw.png

  1. Create Workspace: Once configured, click the “Create” button to initialise the workspace creation.
  2. Await Setup Completion: IDX will set up the necessary environment for your Flask application.

With your workspace ready, we’ve got a basic ‘gemini’ application

Looking through the Hello World application

Obtain a Gemini API Key

!https://cdn-images-1.medium.com/max/800/1*c6tBFPL_FQN9mSHMu84AiQ.png

Before you begin, you’ll need an API key to access the Gemini API.

  1. Visit the Google Cloud Console.
  2. Navigate to the ‘API Keys’ section.
  3. Click ‘Create API Key’.
  4. Choose an existing Google Cloud project or create a new one.
  5. Copy the generated API key. Remember to store your API key securely!

Set up the Flask Application

Check the existing app in main.py — update this with your actual API key.

This is the basic setup of the gemini API

import os
import json
from google.generativeai import genai
from flask import Flask, render_template, request, jsonify

# Replace 'YOUR_API_KEY' with your actual API key
API_KEY = 'YOUR_API_KEY'
genai.configure(api_key=API_KEY)
app = Flask(__name__)
Enter fullscreen mode Exit fullscreen mode

Check theHTML Template

Take a look in index.htmlto serve as the front-end for your web application. This template will display images of baked goods, an input field for the prompt, and a results section.

<!DOCTYPE html>
<html>
<head>
    <title>Baking with the Gemini API</title>
</head>
<body>
    <h1>Baking with the Gemini API</h1>
    <div>
        <img src="images/baked-good-1.jpg" alt="Baked Good 1">
        <img src="images/baked-good-2.jpg" alt="Baked Good 2">
        <img src="images/baked-good-3.jpg" alt="Baked Good 3">
    </div>
    <div>
        <label for="prompt">Provide an example recipe for the baked goods in:</label>
        <input type="text" id="prompt" name="prompt">
        <button onclick="generateRecipe()">Go</button>
    </div>
    <div id="results">
        <h2>Results will appear here</h2>
    </div>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

You’ll see three images on the screen and a prompt input.

Define Flask Routes and Functions

In your main.py file, define the routes and functions to handle requests from the front-end and interact with the Gemini API.

@app.route('/')
def index():
  return render_template('index.html')

@app.route("/api/generate", methods=["POST"])
def generate_api():
    if request.method == "POST":
        try:
            req_body = request.get_json()
            content = req_body.get("contents")
            model = genai.GenerativeModel(model_name=req_body.get("model"))
            response = model.generate_content(content, stream=True)
            def stream():
                for chunk in response:
                    yield 'data: %s\n\n' % json.dumps({ "text": chunk.text })

            return stream(), {'Content-Type': 'text/event-stream'}

        except Exception as e:
            return jsonify({ "error": str(e) })
Enter fullscreen mode Exit fullscreen mode

In this block of code, you’ll notice that we take the ‘model’ from the input, and the ‘contents’ which have been fired to the API from javascript.

We ‘stream’ the response — and that means we can pass that streamed content back to the frontend so they’re not waiting for everything to finish before showing it to the user.

Run the Flask Application

If you’re inside IDX, you can just view it in the ‘preview’ window

!https://cdn-images-1.medium.com/max/800/1*Xl8qXDvkkDJ9PoozstwupA.png

Now, you should be able to access your web application in your browser, select an image of a baked good, enter a prompt, and generate a baking recipe using the Gemini API. If you’re running inside of IDX, you can just run the ‘web preview’ — click Cmd+Shift+P and enter ‘Web preview’ (if on a mac) and you’ll see the preview window.

Hit generate and you’ll see a recipe based on the image you’ve selected:

!https://cdn-images-1.medium.com/max/800/1*9hBTJhjNHWv4lmZ-yFvMAw.png

Analysing the javascript

If you look through main.js you’ll see this file:

import { streamGemini } from './gemini-api.js';

let form = document.querySelector('form');
let promptInput = document.querySelector('input[name="prompt"]');
let output = document.querySelector('.output');

form.onsubmit = async (ev) => {
  ev.preventDefault();
  output.textContent = 'Generating...';

  try {
    // Load the image as a base64 string
    let imageUrl = form.elements.namedItem('chosen-image').value;
    let imageBase64 = await fetch(imageUrl)
      .then(r => r.arrayBuffer())
      .then(a => base64js.fromByteArray(new Uint8Array(a)));

    // Assemble the prompt by combining the text with the chosen image
    let contents = [
      {
        role: 'user',
        parts: [
          { inline_data: { mime_type: 'image/jpeg', data: imageBase64, } },
          { text: promptInput.value }
        ]
      }
    ];

    // Call the multimodal model, and get a stream of results
    let stream = streamGemini({
      model: 'gemini-1.5-flash', // or gemini-1.5-pro
      contents,
    });

    // Read from the stream and interpret the output as markdown
    let buffer = [];
    let md = new markdownit();
    for await (let chunk of stream) {
      buffer.push(chunk);
      output.innerHTML = md.render(buffer.join(''));
    }
  } catch (e) {
    output.innerHTML += '<hr>' + e;
  }
};
Enter fullscreen mode Exit fullscreen mode

In it you can see that when the form is submit, it passes the model and the ‘contents’ — this contents object is what gemini expects and in python it mostly does a pass through.

content = req_body.get("contents")
model = genai.GenerativeModel(model_name=req_body.get("model"))
response = model.generate_content(content, stream=True)
Enter fullscreen mode Exit fullscreen mode

It’s quite a straightforward setup — we base64 encode the image, upload it along with the prompt to gemini and then stream the response.

You can also see the stream response is coming from the gemini-api.js file:

/**
 * Calls the given Gemini model with the given image and/or text
 * parts, streaming output (as a generator function).
 */
export async function* streamGemini({
  model = 'gemini-1.5-flash', // or gemini-1.5-pro
  contents = [],
} = {}) {
  // Send the prompt to the Python backend
  // Call API defined in main.py
  let response = await fetch("/api/generate", {
    method: "POST",
    headers: { "content-type": "application/json" },
    body: JSON.stringify({ model, contents })
  });

  yield* streamResponseChunks(response);
}

/**
 * A helper that streams text output chunks from a fetch() response.
 */
async function* streamResponseChunks(response) {
  let buffer = '';

  const CHUNK_SEPARATOR = '\n\n';

  let processBuffer = async function* (streamDone = false) {
    while (true) {
      let flush = false;
      let chunkSeparatorIndex = buffer.indexOf(CHUNK_SEPARATOR);
      if (streamDone && chunkSeparatorIndex < 0) {
        flush = true;
        chunkSeparatorIndex = buffer.length;
      }
      if (chunkSeparatorIndex < 0) {
        break;
      }

      let chunk = buffer.substring(0, chunkSeparatorIndex);
      buffer = buffer.substring(chunkSeparatorIndex + CHUNK_SEPARATOR.length);
      chunk = chunk.replace(/^data:\s*/, '').trim();
      if (!chunk) {
        if (flush) break;
        continue;
      }
      let { error, text } = JSON.parse(chunk);
      if (error) {
        console.error(error);
        throw new Error(error?.message || JSON.stringify(error));
      }
      yield text;
      if (flush) break;
    }
  };

  const reader = response.body.getReader();
  try {
    while (true) {
      const { done, value } = await reader.read()
      if (done) break;
      buffer += new TextDecoder().decode(value);
      console.log(new TextDecoder().decode(value));
      yield* processBuffer();
    }
  } finally {
    reader.releaseLock();
  }

  yield* processBuffer(true);
}
Enter fullscreen mode Exit fullscreen mode

In this file, the response is streamed from the server and rendered into markdown as the data comes back to the frontend.

That’s a wrap

We’ve covered a fair bit in this hello-world application, explaining how to get gemini setup in a python and flask app inside IDX, how to stream the response and how to use images as part of it!

Top comments (0)