This is part 1 in a series!
Show me the code:
https://github.com/dllewellyn/hello-gemini
Introduction
In this series of tutorials, we’re going to use a suite of google tools — AI, cloud and dev — in order to create and deploy an AI powered application. We’ll steer clear on chatbots, because they are a tedious in the extreme and focus on building something more interesting.
Creating a “Hello World” Flask Application with Google IDX and the Gemini API
To get started, we’re going to use the ‘hello world’ IDX — Gemini and flask application. This gives us a really quick way to get setup with some AI tooling.
Project Setup
- Navigate to Google IDX: Begin by accessing the Google IDX platform in your web browser.
- Select the Gemini API Template: From the IDX welcome screen, locate and click on the “Gemini API” template under the “Start something new with a template” section.
- Configure New Workspace: A “New Workspace” window will appear.
- Name your workspace: I’ve just called it “hello-gemini.”
- Environment: Choose the “Python Web App (Flask)” option from the dropdown menu.
- Create Workspace: Once configured, click the “Create” button to initialise the workspace creation.
- Await Setup Completion: IDX will set up the necessary environment for your Flask application.
With your workspace ready, we’ve got a basic ‘gemini’ application
Looking through the Hello World application
Obtain a Gemini API Key
Before you begin, you’ll need an API key to access the Gemini API.
- Visit the Google Cloud Console.
- Navigate to the ‘API Keys’ section.
- Click ‘Create API Key’.
- Choose an existing Google Cloud project or create a new one.
- Copy the generated API key. Remember to store your API key securely!
Set up the Flask Application
Check the existing app in main.py
— update this with your actual API key.
This is the basic setup of the gemini API
import os
import json
from google.generativeai import genai
from flask import Flask, render_template, request, jsonify
# Replace 'YOUR_API_KEY' with your actual API key
API_KEY = 'YOUR_API_KEY'
genai.configure(api_key=API_KEY)
app = Flask(__name__)
Check theHTML Template
Take a look in index.html
to serve as the front-end for your web application. This template will display images of baked goods, an input field for the prompt, and a results section.
<!DOCTYPE html>
<html>
<head>
<title>Baking with the Gemini API</title>
</head>
<body>
<h1>Baking with the Gemini API</h1>
<div>
<img src="images/baked-good-1.jpg" alt="Baked Good 1">
<img src="images/baked-good-2.jpg" alt="Baked Good 2">
<img src="images/baked-good-3.jpg" alt="Baked Good 3">
</div>
<div>
<label for="prompt">Provide an example recipe for the baked goods in:</label>
<input type="text" id="prompt" name="prompt">
<button onclick="generateRecipe()">Go</button>
</div>
<div id="results">
<h2>Results will appear here</h2>
</div>
</body>
</html>
You’ll see three images on the screen and a prompt input.
Define Flask Routes and Functions
In your main.py
file, define the routes and functions to handle requests from the front-end and interact with the Gemini API.
@app.route('/')
def index():
return render_template('index.html')
@app.route("/api/generate", methods=["POST"])
def generate_api():
if request.method == "POST":
try:
req_body = request.get_json()
content = req_body.get("contents")
model = genai.GenerativeModel(model_name=req_body.get("model"))
response = model.generate_content(content, stream=True)
def stream():
for chunk in response:
yield 'data: %s\n\n' % json.dumps({ "text": chunk.text })
return stream(), {'Content-Type': 'text/event-stream'}
except Exception as e:
return jsonify({ "error": str(e) })
In this block of code, you’ll notice that we take the ‘model’ from the input, and the ‘contents’ which have been fired to the API from javascript.
We ‘stream’ the response — and that means we can pass that streamed content back to the frontend so they’re not waiting for everything to finish before showing it to the user.
Run the Flask Application
If you’re inside IDX, you can just view it in the ‘preview’ window
Now, you should be able to access your web application in your browser, select an image of a baked good, enter a prompt, and generate a baking recipe using the Gemini API. If you’re running inside of IDX, you can just run the ‘web preview’ — click Cmd+Shift+P and enter ‘Web preview’ (if on a mac) and you’ll see the preview window.
Hit generate and you’ll see a recipe based on the image you’ve selected:
Analysing the javascript
If you look through main.js you’ll see this file:
import { streamGemini } from './gemini-api.js';
let form = document.querySelector('form');
let promptInput = document.querySelector('input[name="prompt"]');
let output = document.querySelector('.output');
form.onsubmit = async (ev) => {
ev.preventDefault();
output.textContent = 'Generating...';
try {
// Load the image as a base64 string
let imageUrl = form.elements.namedItem('chosen-image').value;
let imageBase64 = await fetch(imageUrl)
.then(r => r.arrayBuffer())
.then(a => base64js.fromByteArray(new Uint8Array(a)));
// Assemble the prompt by combining the text with the chosen image
let contents = [
{
role: 'user',
parts: [
{ inline_data: { mime_type: 'image/jpeg', data: imageBase64, } },
{ text: promptInput.value }
]
}
];
// Call the multimodal model, and get a stream of results
let stream = streamGemini({
model: 'gemini-1.5-flash', // or gemini-1.5-pro
contents,
});
// Read from the stream and interpret the output as markdown
let buffer = [];
let md = new markdownit();
for await (let chunk of stream) {
buffer.push(chunk);
output.innerHTML = md.render(buffer.join(''));
}
} catch (e) {
output.innerHTML += '<hr>' + e;
}
};
In it you can see that when the form is submit, it passes the model and the ‘contents’ — this contents object is what gemini expects and in python it mostly does a pass through.
content = req_body.get("contents")
model = genai.GenerativeModel(model_name=req_body.get("model"))
response = model.generate_content(content, stream=True)
It’s quite a straightforward setup — we base64 encode the image, upload it along with the prompt to gemini and then stream the response.
You can also see the stream response is coming from the gemini-api.js
file:
/**
* Calls the given Gemini model with the given image and/or text
* parts, streaming output (as a generator function).
*/
export async function* streamGemini({
model = 'gemini-1.5-flash', // or gemini-1.5-pro
contents = [],
} = {}) {
// Send the prompt to the Python backend
// Call API defined in main.py
let response = await fetch("/api/generate", {
method: "POST",
headers: { "content-type": "application/json" },
body: JSON.stringify({ model, contents })
});
yield* streamResponseChunks(response);
}
/**
* A helper that streams text output chunks from a fetch() response.
*/
async function* streamResponseChunks(response) {
let buffer = '';
const CHUNK_SEPARATOR = '\n\n';
let processBuffer = async function* (streamDone = false) {
while (true) {
let flush = false;
let chunkSeparatorIndex = buffer.indexOf(CHUNK_SEPARATOR);
if (streamDone && chunkSeparatorIndex < 0) {
flush = true;
chunkSeparatorIndex = buffer.length;
}
if (chunkSeparatorIndex < 0) {
break;
}
let chunk = buffer.substring(0, chunkSeparatorIndex);
buffer = buffer.substring(chunkSeparatorIndex + CHUNK_SEPARATOR.length);
chunk = chunk.replace(/^data:\s*/, '').trim();
if (!chunk) {
if (flush) break;
continue;
}
let { error, text } = JSON.parse(chunk);
if (error) {
console.error(error);
throw new Error(error?.message || JSON.stringify(error));
}
yield text;
if (flush) break;
}
};
const reader = response.body.getReader();
try {
while (true) {
const { done, value } = await reader.read()
if (done) break;
buffer += new TextDecoder().decode(value);
console.log(new TextDecoder().decode(value));
yield* processBuffer();
}
} finally {
reader.releaseLock();
}
yield* processBuffer(true);
}
In this file, the response is streamed from the server and rendered into markdown as the data comes back to the frontend.
That’s a wrap
We’ve covered a fair bit in this hello-world application, explaining how to get gemini setup in a python and flask app inside IDX, how to stream the response and how to use images as part of it!
Top comments (0)