DEV Community

Cover image for Gemini API 102a: Putting together basic GenAI web apps
Wesley Chun (@wescpy)
Wesley Chun (@wescpy)

Posted on • Edited on

Gemini API 102a: Putting together basic GenAI web apps

TL;DR:

The first pair of posts in this ongoing Gemini API series provide a thorough introduction to using the API, primarily from Google AI. (Existing GCP users can easily migrate their code to its Vertex AI platform without much difficulty.) However, while command-line scripts are a great way to get started, they aren't how you're going to reach users. Aspiring data scientists and AI professionals make great use of powerful tools like Jupyter Notebooks, but being able to create/prototype web apps may also be useful. This post aims to address both of these "issues" by demonstrating use of the Gemini API in a basic genAI web app using Flask (Python) or Express.js (Node.js), all in about 100 lines of code!

Nov 2025 update: Updated sample code to Gemini 2.5 (Flash). For more details, see this post (and its corresponding repo), which also serves as a migration guide to the new client library. Also added is a FastAPI version to join the original Flask app and integrated into the post below.

Build with Gemini

Introduction

Welcome to the blog covering Google developer technologies, whether you're learning how to code Google Maps, export Google Docs as PDF, or learn about serverless computing, computing with Google](/wescpy/a-broader-perspective-of-serverless-1md1), this is the right place to be. You'll also find posts on common knowledge like credentials, including API keys and OAuth client IDs... all of this) from Python and sometimes Node.js.

If you've been following along in this series covering the Gemini API, you now know how to perform text-only and multimodal queries, use streaming, and multi-turn (or "chat") conversations, all from the command-line issued against one of the Gemini LLMs (large language models). It's time to take it to the next level by building a basic web app that uses the Gemini API.

Application

Regardless of whether you build the app with Python or Node.js, the app works identically. End-users upload an image file (JPG, PNG, GIF formats supported) along with a text prompt. The app then performs a multimodal query to the Gemini API using the latest 2.5 Flash model then displays a reduced-size version of the image along with the prompt as well as the generated result from the model.

The Python app comes in both Flask and FastAPI versions while the Node app uses Express.js. The Jinja2 web templating system is supported by both Python frameworks and is the same syntax as Nunjucks, a similar template system for Node.js inspired by Jinja2.)

All versions of the app, with comments, plus the web templates can be found in the repo folder. Be sure you've created an API key and stored it as API_KEY = '<YOUR_API_KEY>' in settings.py for Python or .env for Node.js before jumping into the code. We'll start with Python first. (Settings environment templates for both files are available in the repo [see below].)

Python

Application files

App Description Platform
python/settings_TMPL.py Environment settings template Python 3
python/requirements.txt 3rd-party packages (Flask) Python 3
python/main.py Flask sample app Python 3
python/templates/index.html Web template Jinja2 (identical to Nunjucks)
python/fastapi/requirements.txt 3rd-party packages (FastAPI) Python 3
python/fastapi/main.py FastAPI sample app Python 3

For the FastAPI version, grab the files from the fastapi subfolder and overwrite their Flask equivalents in the main folder; all other files remain as-is.

Setup

  1. Ensure your Python (and pip) installation is up-to-date (3.9+ recommended)
  2. (optional) Create & activate a virtual environment ("virtualenv") for isolation
    • python3 -m venv .myenv; source .myenv/bin/activate
    • For the commands below, depending on your system configuration, you will use one of (pip, pip3, python3 -m pip), but the instructions are generalized to pip.
  3. (optional) Update pip and install uv: pip install -U pip uv
  4. Install all packages (old & new client libraries): uv pip install -Ur requirements.txt (drop uv if you didn't install it)

Code walk-through

The only new package installed this time is the Flask micro web framework which comes with the Jinja2 templating system. Let's dive into main application file, one chunk at a time, starting with the imports:

Flask

from base64 import b64encode
import io

from flask import Flask, render_template, request, url_for
from werkzeug.utils import secure_filename
from PIL import Image
import mistune

from google import genai
from settings import API_KEY
Enter fullscreen mode Exit fullscreen mode

The io standard library package has an object (io.BytesIO) used in this app as an in-memory "disk file" for the thumbnail. You create your own settings.py local file to hold the API key (not in the repo). There are some the 3rd-party packages as well:

Module/package Use
flask Flask: popular synchronous micro web framework
werkzeug Werkzeug: collection of WSGI web app utilities
PIL Pillow: flexible fork of well-known Python Imaging Library (PIL)
mistune Mistune: popular Python Markdown parsing library
google.genai Google GenAI SDK: provides access to Gemini API & models

Werkzeug and the Jinja2 templating system are Flask dependencies... for the most part these are resources requested directly from Flask save for the single explicit import of Werkzeug's secure_filename() function which is used in the app.

The FastAPI version is similar:

FastAPI

from base64 import b64encode
import io

from fastapi import FastAPI, Request, UploadFile, File, Form
from fastapi.responses import HTMLResponse
from fastapi.templating import Jinja2Templates
from PIL import Image
import mistune

from google import genai
from settings import API_KEY    # can also use .env & python-dotenv
Enter fullscreen mode Exit fullscreen mode
Module/package Use
fastapi[standard] FastAPI: popular asynchronous micro web framework and related packages
pillow Pillow: flexible fork of well-known Python Imaging Library (PIL)
google.genai Google GenAI SDK: provides access to Gemini API & models

The fastapi[standard] entry is an extended package which includes fastapi, uvicorn (ASGI server), python-multipart (form data), and fastapi-cli (FastAPI command-line tool).

(both)

ALLOW_EXTS = {'png', 'jpg', 'jpeg', 'gif'}
MODEL_NAME = 'gemini-2.5-flash'
THUMB_DIMS = 480, 360
JINUN_TMPL = 'index.html'
Enter fullscreen mode Exit fullscreen mode
Constant Use
ALLOW_EXTS Image file types supported by the app
MODEL_NAME Gemini LLM model to use in this app
THUMB_DIMS Thumbnail dimensions
JINUN_TMPL Jinja2/Nunjucks template

Flask

app = Flask(__name__)
GENAI = genai.Client(api_key=API_KEY)
Enter fullscreen mode Exit fullscreen mode

FastAPI

app = FastAPI()
templates = Jinja2Templates(directory='templates')
GENAI = genai.Client(api_key=API_KEY)
Enter fullscreen mode Exit fullscreen mode

After the web framework is initialized, the Gemini API client object is instantiated using the provided API key for authorization. The FastAPI version also sets the template folder explicitly (templates) whereas it's the Flask default.

(both)

def is_allowed_file(fname: str) -> bool:
    return '.' in fname and fname.rsplit('.', 1)[1].lower() in ALLOW_EXTS
Enter fullscreen mode Exit fullscreen mode

The is_allowed_file() function takes an uploaded file's name and checks it against the image file types supported by the app per ALLOW_EXTS. Everything else is the main application.

App operation

Before we dive into that, let's visualize what the app does, so you can connect the dots easier reviewing the rest of the code. When you hit the app the first time, you get an initial empty form view:

webgem-empty

[IMG] Gemini API web app: empty form

You can clearly see two primary form elements, a file-picker to choose the image with, and a text field for the LLM prompt (which has a default of "Describe this image," and a submit button. From here, a user is expected to choose a locally-stored image file:

webgem-imgpick

[IMG] Gemini API web app: image file picker

I picked the waterfall picture from the previous post. After the image has been selected, modify the prompt to something of your choosing. Below, I changed the default to "Where is this and what is it?"

webgem-imgNpromptSet

[IMG] Gemini API web app: image and prompt set

With the image selected and prompt set, clicking on the submit button "causes things to happen," and the resulting screen shows a smaller thumbnail of the selected image, the prompt entered, the model used, and the LLM results:

webgem-results

[IMG] Gemini API web app: results

Note that in all these steps, there's always a blank form at the bottom of every step so users can move to another image once they're done with the one they're on. With that, let's look at the main handler:

Main handler: error-checking

The first half of the main handler consists of checking for bad input... take a look:

Flask

@app.route('/', methods=['GET', 'POST'])
def main():
    context = {'upload_url': url_for(request.endpoint)}

    if request.method == 'POST':
        upload = request.files.get('file')
        if not upload:
            context['error'] = 'No uploaded file'
            return render_template(JINUN_TMPL, **context)

        fname = secure_filename(upload.filename.strip())
        if not fname:
            context['error'] = 'Upload must have file name'
            return render_template(JINUN_TMPL, **context)

        if not is_allowed_file(fname):
            context['error'] = 'Only JPG/PNG/GIF files allowed'
            return render_template(JINUN_TMPL, **context)

        prompt = request.form.get('prompt').strip()
        if not prompt:
            context['error'] = 'LLM prompt missing'
            return render_template(JINUN_TMPL, **context)
Enter fullscreen mode Exit fullscreen mode

This is the only handler in the app, supporting both POST and GET requests, meaning it handles the initial empty form page as well as processing actual work requests (via POST). Since the page always shows a blank form at the bottom, the first thing you see is the template context being set with the upload URL redirecting the app to the same endpoint (/).

The next four sections handle various types of bad input:

  • No uploaded file
  • Upload with no file name
  • Upload with unsupported file type
  • No LLM prompt

In each of these situations, an error message is set for the template, and the user is sent straight back to the web template. Any errors are highlighted for reference followed by the same empty form giving the user a chance to correct their mistake. Here's what it looks like if you forgot to upload a file:

webgem-nofile

[IMG] Gemini API web app: no file error

FastAPI

@app.post('/')
async def process_form(request: Request, file: UploadFile = File(...), prompt: str = Form(...)):
    context: dict = {'upload_url': '/', 'request': request}

    if not file.file:
        context['error'] = 'No uploaded file'
        return templates.TemplateResponse(JINUN_TMPL, context)

    fname = file.filename.strip()
    if not fname:
        context['error'] = 'Upload must have file name'
        return templates.TemplateResponse(JINUN_TMPL, context)

    if not is_allowed_file(fname):
        context['error'] = 'Only JPG/PNG/GIF files allowed'
        return templates.TemplateResponse(JINUN_TMPL, context)

    if not prompt:
        context['error'] = 'LLM prompt missing'
        return templates.TemplateResponse(JINUN_TMPL, context)
Enter fullscreen mode Exit fullscreen mode

It's not possible to support both POST and GET requests in FastAPI because the POST request expects form fields which aren't present in GET request, so they need to be split up. The code above is just the POST handler... we'll see the equivalent GET soon. The rest of the file error-checking logic is identical save for the different template rendering syntax.

Main handler: core functionality

The last chunk of code is where all the magic happens.

Flask

        try:
            image = Image.open(upload)
            thumb = image.copy()
            thumb.thumbnail(THUMB_DIMS)
            img_io = io.BytesIO()
            thumb.save(img_io, format=image.format)
            img_io.seek(0)
        except IOError:
            context['error'] = 'Invalid image file/format'
            return render_template(JINUN_TMPL, **context)

        context['model']  = MODEL_NAME
        context['prompt'] = prompt
        thumb_b64 = b64encode(img_io.getvalue()).decode('ascii')
        context['image']  = f'data:{upload.mimetype};base64,{thumb_b64}'
        context['result'] = mistune.html(GENAI.models.generate_content(
                model=MODEL_NAME, contents=(prompt, image)).text)

    return render_template(JINUN_TMPL, **context)

if __name__ == '__main__':
    import os
    app.run(debug=True, threaded=True, host='0.0.0.0',
            port=int(os.environ.get('PORT', 8080)))
Enter fullscreen mode Exit fullscreen mode

Assuming all the inputs pass muster, the real work begins. A copy of the original image is made and which is then converted into a smaller thumbnail for display. It's saved to an in-memory file object (io.BytesIO) and later base64-encoded for the template. Any errors occurring during this image processing results in an error sent to the template.

If all has succeeded thus far, then it goes through the final round of being sent to the LLM for analysis. Before that happens, all of the necessary fields for a successful computation are sent to the template context, including the prompt, model used, and the base64-encoded thumbnail. The Gemini API is passed the image and prompt, and the returned result is converted from Markdown into HTML by Mistune and added to the context for rendering.

Whether a plain GET, or a POST resulting in all of this processing, the template is then rendered, wrapping up the last part of the handler. The rest of the code just kicks off the Flask development server on port 8080 to run the app. (The "devserver" is great for development and testing, but you would choose a more robust server for production.)

FastAPI

    try:
        image = Image.open(file.file)
        thumb = image.copy()
        thumb.thumbnail(THUMB_DIMS)
        img_io = io.BytesIO()
        thumb.save(img_io, format=image.format)
        img_io.seek(0)
    except IOError:
        context['error'] = 'Invalid image file/format'
        return templates.TemplateResponse(JINUN_TMPL, context)

    context['model']  = MODEL_NAME
    context['prompt'] = prompt
    thumb_b64 = b64encode(img_io.getvalue()).decode('ascii')
    context['image']  = f'data:{file.content_type};base64,{thumb_b64}'
    context['result'] = mistune.html(GENAI.models.generate_content(
            model=MODEL_NAME, contents=(prompt, image)).text)

    return templates.TemplateResponse(JINUN_TMPL, context)


@app.get('/')
async def display_form(request: Request):
    context: dict = {'upload_url': '/', 'request': request}
    return templates.TemplateResponse(JINUN_TMPL, context)

if __name__ == '__main__':
    import uvicorn
    uvicorn.run('main:app', host='0.0.0.0', reload=True,
                port=int(os.environ.get('PORT', 8080)))
Enter fullscreen mode Exit fullscreen mode

The FastAPI version is nearly identical again, with two major differences:

  1. The need for a separate GET handler display_form()
  2. Use Uvicorn asynchronous web server instead of Flask dev server

Web template

Now, let's look at the web template to tie the whole thing together:

(both)

<!doctype html>
<html lang="en-US">
<head>
<title>GenAI image analyzer serverless example</title>

<style>
body {
  font-family: Verdana, Helvetica, sans-serif;
  background-color: #DDDDDD;
}
</style>
</head>
<body>

<h1>GenAI basic image analyzer (v0.5)</h1>

{% if error %}
  <h3>Error on previous request</h3>
  <p style="color: red;">{{ error }}</p>
  <hr>
{% endif %}

{% if result and image %}
  <h3>Image uploaded</h3>
  <img src="{{ image }}" alt="thumbnail">

  <h3>LLM analysis</h3>
  <b>Prompt received:</b> {{ prompt }}<p></p>
  <b>Model used:</b> {{ model }}<p></p>
  <b>Model response:</b> {{ result|safe }}<p></p>
  <hr>
{% endif %}

<h3>Analyze an image</h3>

<form action="{{ upload_url }}" onsubmit="submitted()" method="POST" enctype="multipart/form-data">
  <label for="file">Upload image to analyze:</label><br>
  <input type="file" name="file"><p></p>
  <label for="prompt">Image prompt for LLM:</label><br>
  <input type="text" name="prompt" value="Describe this image"><p></p>
  <input type="submit" id="submit">
</form>

<script>
function submitted() {
  document.getElementById("submit").value = "Processing...";
}
</script>
</body>
</html>
Enter fullscreen mode Exit fullscreen mode

The initial headers and limited CSS (cascading style sheets) styling shows up at the top followed by the app title. The error section comes next, displayed only if an error occurs. If an image is processed successfully, the results (via the safe filter) are displayed along with a thumbnail version of the image, LLM name, and prompt. Finally, an empty form shows up afterwards in the case the user has another image to analyze. To run the app, just execute python main.py (or python3).

Both the app (main.py) and template (templates/index.html) can be found in the python folder of the repo.

Node.js

Application files

App Description Platform
nodejs/.env_TMPL Environment settings template Node
nodejs/package.json 3rd-party packages Node (all)
nodejs/main.js Express.js sample app Node (CommonJS script)
nodejs/main.mjs Express.js sample app Node (ECMAscript module)
nodejs/templates/index.html Web template Nunjucks (identical to Jinja2)

Setup

The Node version of the app is a near-mirror image of the Python version, and the web template is identical. Nunjucks was inspired by Jinja2 (Python) and uses its syntax, so that's why it was chosen over other templating systems like EJS. Next steps:

  1. Ensure your Node (including NPM) installation is up-to-date (recommend 18+)
  2. Install packages: npm i

Here is the modern JavaScript ECMAscript module, main.mjs:

import 'dotenv/config';
import express from 'express';
import multer from 'multer';
import nunjucks from 'nunjucks';
import sharp from 'sharp';
import { marked } from 'marked';
import { GoogleGenAI } from '@google/genai';

const PORT = process.env.PORT || 8080;
const ALLOW_EXTS = ['png', 'jpg', 'jpeg', 'gif'];
const MODEL_NAME = 'gemini-2.5-flash-latest';
const THUMB_DIMS = [480, 360];
const JINUN_TMPL = 'index.html';

const app = express();
app.use(express.urlencoded({ extended: false }));
nunjucks.configure('templates', { autoescape: true, express: app });
const upload = multer({ storage: multer.memoryStorage() });
const GENAI = new GoogleGenAI({ apiKey: process.env.API_KEY }); // API key authz

async function is_allowed_file(fname) {
    return (fname.includes('.') && ALLOW_EXTS.includes(
        fname.toLowerCase().slice(((fname.lastIndexOf('.') - 1) >>> 0) + 2)));
}
Enter fullscreen mode Exit fullscreen mode

The major difference in this Node version vs. Python is that there is more initialization required for Express.js middleware, such as setting up the Nunjucks templating system and configuring the Multer system to handle file uploads. These are the 3rd-party packages you see imported at the top.

Package Use
dotenv Dotenv: adds environment variables from .env
express Express.js: popular micro web framework
multer Multer: middleware to handle file uploads
nunjucks Nunjucks: JavaScript templating system
sharp Sharp: high-performance image processing library
marked Marked: popular Node.js Markdown parsing library
@google/genai Google GenAI SDK: provides access to Gemini API & models

Replace the imports with these require() calls to convert to CommonJS -- all other lines remain identical to the ECMAscript module:

require('dotenv').config();
const express = require('express');
const multer = require('multer');
const nunjucks = require('nunjucks');
const sharp = require('sharp');
const marked = require('marked');
const { GoogleGenAI } = require('@google/genai');
Enter fullscreen mode Exit fullscreen mode

Python uses fewer 3rd-party packages explicitly because the (Jinja2) templating system is a Flask dependency, and Flask itself handles file uploads. The Python app also uses settings.py, a nod from Django, instead of .env like Node.js, requiring dotenv.

app.all('/', upload.single('file'), async (req, rsp) => {
    let context = {
        upload_url: `${req.protocol}://${req.get('host')}${req.originalUrl}`
    };

    if (req.method === 'POST') {
        const upload = req.file;
        if (!upload) {
            context.error = 'No uploaded file';
            return rsp.render(JINUN_TMPL, context);
        }
        const fname = upload.originalname.trim();
        if (!fname) {
            context.error = 'Upload must have file name';
            return rsp.render(JINUN_TMPL, context);
        }
        const allowed = await is_allowed_file(fname);
        if (!allowed) {
            context.error = 'Only JPG/PNG/GIF files allowed';
            return rsp.render(JINUN_TMPL, context);
        }
        const prompt = req.body.prompt.trim();
        if (!prompt) {
            context.error = 'LLM prompt missing';
            return rsp.render(JINUN_TMPL, context);
        }

        const image = upload.buffer;
        const mimeType = upload.mimetype;
        var thumb_b64;
        try {
            const thumb = await sharp(image);
            const thumb_buf = await thumb.resize({ width: THUMB_DIMS[0] }).toBuffer();
            thumb_b64 = thumb_buf.toString('base64');
        }
        catch (ex) {
            context.error = 'Invalid image file/format';
            return rsp.render(JINUN_TMPL, context);
        }

        context.model = MODEL_NAME;
        context.prompt = prompt;
        context.image = `data:${mimeType};base64,${thumb_b64}`;
        const payload = { inlineData: { data: image.toString('base64'), mimeType } };
        const response = await GENAI.models.generateContent({
            model: MODEL_NAME,
            contents: [prompt, payload]
        });
        context.result = marked.parse(response.text);
    }
    return rsp.render(JINUN_TMPL, context);
});

app.listen(PORT, () => console.log(`* Running on port ${PORT}`));
Enter fullscreen mode Exit fullscreen mode

The main handler is a twin of the Python version, comprised of the same major sections:

  1. Set upload_url in context and error-check
  2. Create thumbnail and base64-encode it
  3. Send image thumb, model, prompt, and results from Gemini to template

As the web template is nearly identical to the Python version, run the web app with node main.mjs (or node main.js). All files are in the Node.js repo folder.

Dotenv now optional

You can remove explicit installation and configuration of dotenv in modern Node.js releases (20.6.0+). Be sure to point to the .env file when executing your script if you take the relevant lines out:

$ node --env-file=.env main.mjs # (or main.js)

Summary

Many are excited to delve into the world of GenAI & LLMs, and while user-friendly "Hello World!" scripts are a great way to get started, seeing how to integrate usage of the Gemini API in web apps brings developers one next step closer to productizing something. This post highlights a basic web app that takes a prompt and image input, sends them to the Gemini API, displaying the results along with an empty form to analyze the next image. From here, use your imagination as to what you can build on top of this baseline "MVP" web app.

If you found an error in this post, a bug in the code, or have a topic you want me to cover in the future, drop a note in the comments below or file an issue at the repo. Also check out other posts in this series covering the Gemini API. Thanks for reading, and I hope to meet you at an upcoming event soon... see the travel calendar at the bottom of my consulting site.

PREV POST: Part 2: Gemini API 102: Next steps beyond "Hello World!"
NEXT POST: Part 4: Generate audio clips with Gemini 2.0 Flash

References

Below are links to resources for this post or additional references you may find useful.

Blog post code samples

Google AI, Gemini, Gemini API

Other relevant content by the author



WESLEY CHUN, MSCS, is a Google Developer Expert (GDE) in Google Cloud (GCP) & Google Workspace (GWS), author of Prentice Hall's bestselling "Core Python" series, co-author of "Python Web Development with Django", and has written for Linux Journal & CNET. He's currently an AI Technical Program Manager at Red Hat focused on upstream open source projects that make their way into Red Hat AI products. In his spare time, Wesley helps clients with Google integrations, App Engine migrations, and Python training & engineering. He was one of the original Yahoo!Mail engineers and spent 13+ years on various Google product teams, speaking on behalf of their APIs, producing sample apps, codelabs, and videos for serverless migration and GWS developers Wesley holds degrees in Computer Science, Mathematics, and Music from the University of California, is a Fellow of the Python Software Foundation, and loves to travel to meet developers worldwide. Follow he/him @wescpy on Tw/X, BS, and his technical blog. Find this content useful? Contact CyberWeb for professional services or buy him a coffee (or tea)!

Top comments (0)