TL;DR:
The first pair of posts in this ongoing Gemini API series provide a thorough introduction to using the API, primarily from Google AI. (Existing GCP users can easily migrate their code to its Vertex AI platform without much difficulty.) However, while command-line scripts are a great way to get started, they aren't how you're going to reach users. Aspiring data scientists and AI professionals make great use of powerful tools like Jupyter Notebooks, but being able to create/prototype web apps may also be useful. This post aims to address both of these "issues" by demonstrating use of the Gemini API in a basic genAI web app using Flask (Python) or Express.js (Node.js), all in about 100 lines of code!
Nov 2025 update: Updated sample code to Gemini 2.5 (Flash). For more details, see this post (and its corresponding repo), which also serves as a migration guide to the new client library. Also added is a FastAPI version to join the original Flask app and integrated into the post below.
Introduction
Welcome to the blog covering Google developer technologies, whether you're learning how to code Google Maps, export Google Docs as PDF, or learn about serverless computing, computing with Google](/wescpy/a-broader-perspective-of-serverless-1md1), this is the right place to be. You'll also find posts on common knowledge like credentials, including API keys and OAuth client IDs... all of this) from Python and sometimes Node.js.
If you've been following along in this series covering the Gemini API, you now know how to perform text-only and multimodal queries, use streaming, and multi-turn (or "chat") conversations, all from the command-line issued against one of the Gemini LLMs (large language models). It's time to take it to the next level by building a basic web app that uses the Gemini API.
Application
Regardless of whether you build the app with Python or Node.js, the app works identically. End-users upload an image file (JPG, PNG, GIF formats supported) along with a text prompt. The app then performs a multimodal query to the Gemini API using the latest 2.5 Flash model then displays a reduced-size version of the image along with the prompt as well as the generated result from the model.
The Python app comes in both Flask and FastAPI versions while the Node app uses Express.js. The Jinja2 web templating system is supported by both Python frameworks and is the same syntax as Nunjucks, a similar template system for Node.js inspired by Jinja2.)
All versions of the app, with comments, plus the web templates can be found in the repo folder. Be sure you've created an API key and stored it as API_KEY = '<YOUR_API_KEY>' in settings.py for Python or .env for Node.js before jumping into the code. We'll start with Python first. (Settings environment templates for both files are available in the repo [see below].)
Python
Application files
| App | Description | Platform |
|---|---|---|
python/settings_TMPL.py |
Environment settings template | Python 3 |
python/requirements.txt |
3rd-party packages (Flask) | Python 3 |
python/main.py |
Flask sample app | Python 3 |
python/templates/index.html |
Web template | Jinja2 (identical to Nunjucks) |
python/fastapi/requirements.txt |
3rd-party packages (FastAPI) | Python 3 |
python/fastapi/main.py |
FastAPI sample app | Python 3 |
For the FastAPI version, grab the files from the fastapi subfolder and overwrite their Flask equivalents in the main folder; all other files remain as-is.
Setup
- Ensure your Python (and
pip) installation is up-to-date (3.9+ recommended) - (optional) Create & activate a virtual environment ("virtualenv") for isolation
python3 -m venv .myenv; source .myenv/bin/activate- For the commands below, depending on your system configuration, you will use one of (
pip,pip3,python3 -m pip), but the instructions are generalized topip.
- (optional) Update
pipand installuv:pip install -U pip uv - Install all packages (old & new client libraries):
uv pip install -Ur requirements.txt(dropuvif you didn't install it)
Code walk-through
The only new package installed this time is the Flask micro web framework which comes with the Jinja2 templating system. Let's dive into main application file, one chunk at a time, starting with the imports:
Flask
from base64 import b64encode
import io
from flask import Flask, render_template, request, url_for
from werkzeug.utils import secure_filename
from PIL import Image
import mistune
from google import genai
from settings import API_KEY
The io standard library package has an object (io.BytesIO) used in this app as an in-memory "disk file" for the thumbnail. You create your own settings.py local file to hold the API key (not in the repo). There are some the 3rd-party packages as well:
| Module/package | Use |
|---|---|
flask |
Flask: popular synchronous micro web framework |
werkzeug |
Werkzeug: collection of WSGI web app utilities |
PIL |
Pillow: flexible fork of well-known Python Imaging Library (PIL) |
mistune |
Mistune: popular Python Markdown parsing library |
google.genai |
Google GenAI SDK: provides access to Gemini API & models |
Werkzeug and the Jinja2 templating system are Flask dependencies... for the most part these are resources requested directly from Flask save for the single explicit import of Werkzeug's secure_filename() function which is used in the app.
The FastAPI version is similar:
FastAPI
from base64 import b64encode
import io
from fastapi import FastAPI, Request, UploadFile, File, Form
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from PIL import Image
import mistune
from google import genai
from settings import API_KEY # can also use .env & python-dotenv
| Module/package | Use |
|---|---|
fastapi[standard] |
FastAPI: popular asynchronous micro web framework and related packages |
pillow |
Pillow: flexible fork of well-known Python Imaging Library (PIL) |
google.genai |
Google GenAI SDK: provides access to Gemini API & models |
The fastapi[standard] entry is an extended package which includes fastapi, uvicorn (ASGI server), python-multipart (form data), and fastapi-cli (FastAPI command-line tool).
(both)
ALLOW_EXTS = {'png', 'jpg', 'jpeg', 'gif'}
MODEL_NAME = 'gemini-2.5-flash'
THUMB_DIMS = 480, 360
JINUN_TMPL = 'index.html'
| Constant | Use |
|---|---|
ALLOW_EXTS |
Image file types supported by the app |
MODEL_NAME |
Gemini LLM model to use in this app |
THUMB_DIMS |
Thumbnail dimensions |
JINUN_TMPL |
Jinja2/Nunjucks template |
Flask
app = Flask(__name__)
GENAI = genai.Client(api_key=API_KEY)
FastAPI
app = FastAPI()
templates = Jinja2Templates(directory='templates')
GENAI = genai.Client(api_key=API_KEY)
After the web framework is initialized, the Gemini API client object is instantiated using the provided API key for authorization. The FastAPI version also sets the template folder explicitly (templates) whereas it's the Flask default.
(both)
def is_allowed_file(fname: str) -> bool:
return '.' in fname and fname.rsplit('.', 1)[1].lower() in ALLOW_EXTS
The is_allowed_file() function takes an uploaded file's name and checks it against the image file types supported by the app per ALLOW_EXTS. Everything else is the main application.
App operation
Before we dive into that, let's visualize what the app does, so you can connect the dots easier reviewing the rest of the code. When you hit the app the first time, you get an initial empty form view:
You can clearly see two primary form elements, a file-picker to choose the image with, and a text field for the LLM prompt (which has a default of "Describe this image," and a submit button. From here, a user is expected to choose a locally-stored image file:
I picked the waterfall picture from the previous post. After the image has been selected, modify the prompt to something of your choosing. Below, I changed the default to "Where is this and what is it?"
With the image selected and prompt set, clicking on the submit button "causes things to happen," and the resulting screen shows a smaller thumbnail of the selected image, the prompt entered, the model used, and the LLM results:
Note that in all these steps, there's always a blank form at the bottom of every step so users can move to another image once they're done with the one they're on. With that, let's look at the main handler:
Main handler: error-checking
The first half of the main handler consists of checking for bad input... take a look:
Flask
@app.route('/', methods=['GET', 'POST'])
def main():
context = {'upload_url': url_for(request.endpoint)}
if request.method == 'POST':
upload = request.files.get('file')
if not upload:
context['error'] = 'No uploaded file'
return render_template(JINUN_TMPL, **context)
fname = secure_filename(upload.filename.strip())
if not fname:
context['error'] = 'Upload must have file name'
return render_template(JINUN_TMPL, **context)
if not is_allowed_file(fname):
context['error'] = 'Only JPG/PNG/GIF files allowed'
return render_template(JINUN_TMPL, **context)
prompt = request.form.get('prompt').strip()
if not prompt:
context['error'] = 'LLM prompt missing'
return render_template(JINUN_TMPL, **context)
This is the only handler in the app, supporting both POST and GET requests, meaning it handles the initial empty form page as well as processing actual work requests (via POST). Since the page always shows a blank form at the bottom, the first thing you see is the template context being set with the upload URL redirecting the app to the same endpoint (/).
The next four sections handle various types of bad input:
- No uploaded file
- Upload with no file name
- Upload with unsupported file type
- No LLM prompt
In each of these situations, an error message is set for the template, and the user is sent straight back to the web template. Any errors are highlighted for reference followed by the same empty form giving the user a chance to correct their mistake. Here's what it looks like if you forgot to upload a file:
FastAPI
@app.post('/')
async def process_form(request: Request, file: UploadFile = File(...), prompt: str = Form(...)):
context: dict = {'upload_url': '/', 'request': request}
if not file.file:
context['error'] = 'No uploaded file'
return templates.TemplateResponse(JINUN_TMPL, context)
fname = file.filename.strip()
if not fname:
context['error'] = 'Upload must have file name'
return templates.TemplateResponse(JINUN_TMPL, context)
if not is_allowed_file(fname):
context['error'] = 'Only JPG/PNG/GIF files allowed'
return templates.TemplateResponse(JINUN_TMPL, context)
if not prompt:
context['error'] = 'LLM prompt missing'
return templates.TemplateResponse(JINUN_TMPL, context)
It's not possible to support both POST and GET requests in FastAPI because the POST request expects form fields which aren't present in GET request, so they need to be split up. The code above is just the POST handler... we'll see the equivalent GET soon. The rest of the file error-checking logic is identical save for the different template rendering syntax.
Main handler: core functionality
The last chunk of code is where all the magic happens.
Flask
try:
image = Image.open(upload)
thumb = image.copy()
thumb.thumbnail(THUMB_DIMS)
img_io = io.BytesIO()
thumb.save(img_io, format=image.format)
img_io.seek(0)
except IOError:
context['error'] = 'Invalid image file/format'
return render_template(JINUN_TMPL, **context)
context['model'] = MODEL_NAME
context['prompt'] = prompt
thumb_b64 = b64encode(img_io.getvalue()).decode('ascii')
context['image'] = f'data:{upload.mimetype};base64,{thumb_b64}'
context['result'] = mistune.html(GENAI.models.generate_content(
model=MODEL_NAME, contents=(prompt, image)).text)
return render_template(JINUN_TMPL, **context)
if __name__ == '__main__':
import os
app.run(debug=True, threaded=True, host='0.0.0.0',
port=int(os.environ.get('PORT', 8080)))
Assuming all the inputs pass muster, the real work begins. A copy of the original image is made and which is then converted into a smaller thumbnail for display. It's saved to an in-memory file object (io.BytesIO) and later base64-encoded for the template. Any errors occurring during this image processing results in an error sent to the template.
If all has succeeded thus far, then it goes through the final round of being sent to the LLM for analysis. Before that happens, all of the necessary fields for a successful computation are sent to the template context, including the prompt, model used, and the base64-encoded thumbnail. The Gemini API is passed the image and prompt, and the returned result is converted from Markdown into HTML by Mistune and added to the context for rendering.
Whether a plain GET, or a POST resulting in all of this processing, the template is then rendered, wrapping up the last part of the handler. The rest of the code just kicks off the Flask development server on port 8080 to run the app. (The "devserver" is great for development and testing, but you would choose a more robust server for production.)
FastAPI
try:
image = Image.open(file.file)
thumb = image.copy()
thumb.thumbnail(THUMB_DIMS)
img_io = io.BytesIO()
thumb.save(img_io, format=image.format)
img_io.seek(0)
except IOError:
context['error'] = 'Invalid image file/format'
return templates.TemplateResponse(JINUN_TMPL, context)
context['model'] = MODEL_NAME
context['prompt'] = prompt
thumb_b64 = b64encode(img_io.getvalue()).decode('ascii')
context['image'] = f'data:{file.content_type};base64,{thumb_b64}'
context['result'] = mistune.html(GENAI.models.generate_content(
model=MODEL_NAME, contents=(prompt, image)).text)
return templates.TemplateResponse(JINUN_TMPL, context)
@app.get('/')
async def display_form(request: Request):
context: dict = {'upload_url': '/', 'request': request}
return templates.TemplateResponse(JINUN_TMPL, context)
if __name__ == '__main__':
import uvicorn
uvicorn.run('main:app', host='0.0.0.0', reload=True,
port=int(os.environ.get('PORT', 8080)))
The FastAPI version is nearly identical again, with two major differences:
- The need for a separate GET handler
display_form() - Use Uvicorn asynchronous web server instead of Flask dev server
Web template
Now, let's look at the web template to tie the whole thing together:
(both)
<!doctype html>
<html lang="en-US">
<head>
<title>GenAI image analyzer serverless example</title>
<style>
body {
font-family: Verdana, Helvetica, sans-serif;
background-color: #DDDDDD;
}
</style>
</head>
<body>
<h1>GenAI basic image analyzer (v0.5)</h1>
{% if error %}
<h3>Error on previous request</h3>
<p style="color: red;">{{ error }}</p>
<hr>
{% endif %}
{% if result and image %}
<h3>Image uploaded</h3>
<img src="{{ image }}" alt="thumbnail">
<h3>LLM analysis</h3>
<b>Prompt received:</b> {{ prompt }}<p></p>
<b>Model used:</b> {{ model }}<p></p>
<b>Model response:</b> {{ result|safe }}<p></p>
<hr>
{% endif %}
<h3>Analyze an image</h3>
<form action="{{ upload_url }}" onsubmit="submitted()" method="POST" enctype="multipart/form-data">
<label for="file">Upload image to analyze:</label><br>
<input type="file" name="file"><p></p>
<label for="prompt">Image prompt for LLM:</label><br>
<input type="text" name="prompt" value="Describe this image"><p></p>
<input type="submit" id="submit">
</form>
<script>
function submitted() {
document.getElementById("submit").value = "Processing...";
}
</script>
</body>
</html>
The initial headers and limited CSS (cascading style sheets) styling shows up at the top followed by the app title. The error section comes next, displayed only if an error occurs. If an image is processed successfully, the results (via the safe filter) are displayed along with a thumbnail version of the image, LLM name, and prompt. Finally, an empty form shows up afterwards in the case the user has another image to analyze. To run the app, just execute python main.py (or python3).
Both the app (main.py) and template (templates/index.html) can be found in the python folder of the repo.
Node.js
Application files
| App | Description | Platform |
|---|---|---|
nodejs/.env_TMPL |
Environment settings template | Node |
nodejs/package.json |
3rd-party packages | Node (all) |
nodejs/main.js |
Express.js sample app | Node (CommonJS script) |
nodejs/main.mjs |
Express.js sample app | Node (ECMAscript module) |
nodejs/templates/index.html |
Web template | Nunjucks (identical to Jinja2) |
Setup
The Node version of the app is a near-mirror image of the Python version, and the web template is identical. Nunjucks was inspired by Jinja2 (Python) and uses its syntax, so that's why it was chosen over other templating systems like EJS. Next steps:
- Ensure your Node (including NPM) installation is up-to-date (recommend 18+)
- Install packages:
npm i
Here is the modern JavaScript ECMAscript module, main.mjs:
import 'dotenv/config';
import express from 'express';
import multer from 'multer';
import nunjucks from 'nunjucks';
import sharp from 'sharp';
import { marked } from 'marked';
import { GoogleGenAI } from '@google/genai';
const PORT = process.env.PORT || 8080;
const ALLOW_EXTS = ['png', 'jpg', 'jpeg', 'gif'];
const MODEL_NAME = 'gemini-2.5-flash-latest';
const THUMB_DIMS = [480, 360];
const JINUN_TMPL = 'index.html';
const app = express();
app.use(express.urlencoded({ extended: false }));
nunjucks.configure('templates', { autoescape: true, express: app });
const upload = multer({ storage: multer.memoryStorage() });
const GENAI = new GoogleGenAI({ apiKey: process.env.API_KEY }); // API key authz
async function is_allowed_file(fname) {
return (fname.includes('.') && ALLOW_EXTS.includes(
fname.toLowerCase().slice(((fname.lastIndexOf('.') - 1) >>> 0) + 2)));
}
The major difference in this Node version vs. Python is that there is more initialization required for Express.js middleware, such as setting up the Nunjucks templating system and configuring the Multer system to handle file uploads. These are the 3rd-party packages you see imported at the top.
| Package | Use |
|---|---|
dotenv |
Dotenv: adds environment variables from .env
|
express |
Express.js: popular micro web framework |
multer |
Multer: middleware to handle file uploads |
nunjucks |
Nunjucks: JavaScript templating system |
sharp |
Sharp: high-performance image processing library |
marked |
Marked: popular Node.js Markdown parsing library |
@google/genai |
Google GenAI SDK: provides access to Gemini API & models |
Replace the imports with these require() calls to convert to CommonJS -- all other lines remain identical to the ECMAscript module:
require('dotenv').config();
const express = require('express');
const multer = require('multer');
const nunjucks = require('nunjucks');
const sharp = require('sharp');
const marked = require('marked');
const { GoogleGenAI } = require('@google/genai');
Python uses fewer 3rd-party packages explicitly because the (Jinja2) templating system is a Flask dependency, and Flask itself handles file uploads. The Python app also uses settings.py, a nod from Django, instead of .env like Node.js, requiring dotenv.
app.all('/', upload.single('file'), async (req, rsp) => {
let context = {
upload_url: `${req.protocol}://${req.get('host')}${req.originalUrl}`
};
if (req.method === 'POST') {
const upload = req.file;
if (!upload) {
context.error = 'No uploaded file';
return rsp.render(JINUN_TMPL, context);
}
const fname = upload.originalname.trim();
if (!fname) {
context.error = 'Upload must have file name';
return rsp.render(JINUN_TMPL, context);
}
const allowed = await is_allowed_file(fname);
if (!allowed) {
context.error = 'Only JPG/PNG/GIF files allowed';
return rsp.render(JINUN_TMPL, context);
}
const prompt = req.body.prompt.trim();
if (!prompt) {
context.error = 'LLM prompt missing';
return rsp.render(JINUN_TMPL, context);
}
const image = upload.buffer;
const mimeType = upload.mimetype;
var thumb_b64;
try {
const thumb = await sharp(image);
const thumb_buf = await thumb.resize({ width: THUMB_DIMS[0] }).toBuffer();
thumb_b64 = thumb_buf.toString('base64');
}
catch (ex) {
context.error = 'Invalid image file/format';
return rsp.render(JINUN_TMPL, context);
}
context.model = MODEL_NAME;
context.prompt = prompt;
context.image = `data:${mimeType};base64,${thumb_b64}`;
const payload = { inlineData: { data: image.toString('base64'), mimeType } };
const response = await GENAI.models.generateContent({
model: MODEL_NAME,
contents: [prompt, payload]
});
context.result = marked.parse(response.text);
}
return rsp.render(JINUN_TMPL, context);
});
app.listen(PORT, () => console.log(`* Running on port ${PORT}`));
The main handler is a twin of the Python version, comprised of the same major sections:
- Set
upload_urlin context and error-check - Create thumbnail and base64-encode it
- Send image thumb, model, prompt, and results from Gemini to template
As the web template is nearly identical to the Python version, run the web app with node main.mjs (or node main.js). All files are in the Node.js repo folder.
Dotenv now optional
You can remove explicit installation and configuration of
dotenvin modern Node.js releases (20.6.0+). Be sure to point to the.envfile when executing your script if you take the relevant lines out:
$ node --env-file=.env main.mjs # (or main.js)
Summary
Many are excited to delve into the world of GenAI & LLMs, and while user-friendly "Hello World!" scripts are a great way to get started, seeing how to integrate usage of the Gemini API in web apps brings developers one next step closer to productizing something. This post highlights a basic web app that takes a prompt and image input, sends them to the Gemini API, displaying the results along with an empty form to analyze the next image. From here, use your imagination as to what you can build on top of this baseline "MVP" web app.
If you found an error in this post, a bug in the code, or have a topic you want me to cover in the future, drop a note in the comments below or file an issue at the repo. Also check out other posts in this series covering the Gemini API. Thanks for reading, and I hope to meet you at an upcoming event soon... see the travel calendar at the bottom of my consulting site.
PREV POST: Part 2: Gemini API 102: Next steps beyond "Hello World!"
NEXT POST: Part 4: Generate audio clips with Gemini 2.0 Flash
References
Below are links to resources for this post or additional references you may find useful.
Blog post code samples
Google AI, Gemini, Gemini API
- Google AI home
- General GenAI docs
- Gemini API overview
- QuickStart page
- QuickStart code
- GenAI API reference
- Gemini home page
- Gemini models overview
- Gemini models information (& quotas)
Other relevant content by the author
WESLEY CHUN, MSCS, is a Google Developer Expert (GDE) in Google Cloud (GCP) & Google Workspace (GWS), author of Prentice Hall's bestselling "Core Python" series, co-author of "Python Web Development with Django", and has written for Linux Journal & CNET. He's currently an AI Technical Program Manager at Red Hat focused on upstream open source projects that make their way into Red Hat AI products. In his spare time, Wesley helps clients with Google integrations, App Engine migrations, and Python training & engineering. He was one of the original Yahoo!Mail engineers and spent 13+ years on various Google product teams, speaking on behalf of their APIs, producing sample apps, codelabs, and videos for serverless migration and GWS developers Wesley holds degrees in Computer Science, Mathematics, and Music from the University of California, is a Fellow of the Python Software Foundation, and loves to travel to meet developers worldwide. Follow he/him @wescpy on Tw/X, BS, and his technical blog. Find this content useful? Contact CyberWeb for professional services or buy him a coffee (or tea)!






Top comments (0)