leslysandra for Google Developer Experts

Posted on Jun 10

No Cloud, No Cost: Build an Offline Visual AI Agent with Gemma 4

#tutorial #ai #opensource #gemma

With Google’s Gemma 4 12B, you can host a highly intelligent assistant that natively understands text and images right on your everyday laptop. Because it leverages "open-weights," your data never leaves your machine.

In this tutorial, we will build "The Air-Gapped Field Reporter"—an offline agent designed to work entirely off-grid. If you are an intermediate developer who knows some Python but has never run an AI model locally, this guide is for you.

First, the Payoff: What We Are Building

Imagine you are a surveyor or investigator working completely off-grid with zero internet. You snap a photo of a field site, type a quick inquiry, and feed them into your script.

Without hitting a single external cloud server, our local Gemma 4 agent cross-references your prompt with the image asset to generate this exact markdown report in seconds:

Sample image: field_sample.jpg

### 📋 FIELD DISPATCH: URBAN HAZARD AUDIT
**Status:** Completed (Processed Fully Offline)

#### 1. Visual Signage Identification
* **Sign Type:** Coastal Hazard Warning Sign (Tsunami Evacuation/Threat Zone).
* **Text Extracted:** "ZONA AMENAZA TSUNAMI" (Partially obscured by weathering/stickers).
* **Iconography:** Yellow triangle containing a stylized graphic of a massive breaking wave.

#### 2. Environmental & Infrastructure Context
* **Location Profile:** Coastal urban roadway with an integrated, designated two-way bicycle path (ciclovía).
* **Terrain Feature:** Dense residential housing built directly onto a steep hillside background layout. 
* **Vulnerability Assessment:** High density of infrastructure at sea level directly adjacent to the hazard zone sign, with the primary evacuation route likely scaling the background hillside.

#### 3. Immediate Action Items
* [ ] Schedule physical maintenance for the signage to remove obscuring stickers and graffiti on the lower text block.
* [ ] Cross-reference the GPS coordinate of this sign with the municipal digital evacuation blueprint to ensure the bike path remains clear during emergencies.

Let's look at how this works under the hood, and then we will write the code to make it happen.

The Core Concept: Open-Weights vs. Closed APIs

If you've only used tools like ChatGPT or the Gemini web app, you are using a Closed API. You send your data across the internet into a corporate black box, and they send an answer back.

With an Open-Weights model like Gemma 4, Google provides the actual trained "brain" file. You download it, drop it onto your machine, and run it locally.

[Closed API]     ----> Your Private Data ----> Sent to Cloud Server ----> Black Box
[Open-Weights]   ----> Your Private Data ----> Stays on Your Laptop ----> Total Control & Privacy

Why Gemma 4 is a Game-Changer for Laptops

Historically, running a smart 12-Billion parameter model locally required an expensive gaming rig with massive graphics memory (VRAM). If you tried to shrink the model to make it fit on a normal computer, it usually became "forgetful" or started spitting out gibberish.

Gemma 4 fixes this with two massive breakthroughs:

Quantization-Aware Training (QAT): Think of this like compression. Traditional compression shrinks a file after it’s built, which can strip away important details. Google trained Gemma 4 to expect compression while it was still learning. As a result, the compressed version runs beautifully on a standard 16GB RAM laptop, taking up just about 7.6GB of memory without losing its sharp reasoning skills.
The Encoder-Free Design: Most old-school AI models are like a Frankenstein assembly line. They use one separate AI tool to process images, another to transcribe audio, and a third to think about the text. This slows down your laptop and hogs memory. Gemma 4 processes text and pixels directly inside a single, unified model backbone. It's faster, lighter, and much more memory-efficient.

Step 1: Spin Up Your Local AI Engine

Instead of compiling complex C++ code from scratch, we will use Ollama, a lightweight tool that handles local model hosting with a single command.

Download and install Ollama for your operating system (Mac, Windows, or Linux).
Open your terminal or command prompt and run the following command to download and start the official Gemma 4 12B model library tag:

ollama run gemma4:12b

Once the download finishes, Ollama spins up a local server running quietly in the background at http://localhost:11434. It exposes a local API endpoint that acts exactly like standard cloud providers, but works completely offline.

Step 2: Write the Python Orchestrator

Now, let's write the Python script that captures our local assets—a field photo and a structured instruction text—and sends them to our local Gemma 4 engine.

First, make sure you have the required library installed:

pip install requests

Create a file named reporter_agent.py and paste the following code:

import os
import base64
import requests

class AirGappedReporter:
    def __init__(self, api_url="http://localhost:11434/api/chat"):
        self.api_url = api_url

    def _encode_file_to_b64(self, file_path):
        """Converts a local image binary to a base64 string for local engine ingestion."""
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"Required asset not found at: {file_path}")
        with open(file_path, "rb") as f:
            return base64.b64encode(f.read()).decode('utf-8')

    def generate_field_report(self, prompt, image_path=None):
        """Assembles the multimodal payload and sends it to our local Ollama server."""

        # Build the structured message for Gemma 4
        message_content = {
            "role": "user",
            "content": prompt,
            "images": []
        }

        # If a photo is attached, encode it and add it to the image payload array
        if image_path:
            message_content["images"].append(self._encode_file_to_b64(image_path))

        payload = {
            "model": "gemma4:12b",
            "messages": [
                {
                    "role": "system", 
                    "content": (
                        "You are an offline forensic field agent. Structure your output clearly using markdown. "
                        "Only report on features visually verified within the asset provided."
                    )
                },
                message_content
            ],
            # Google's official recommended baseline configurations for Gemma 4
            "options": {
                "temperature": 1.0,  
                "top_p": 0.95,
                "top_k": 64
            },
            "stream": False
        }

        try:
            response = requests.post(self.api_url, json=payload)
            response.raise_for_status()
            return response.json()["message"]["content"]
        except requests.exceptions.RequestException as e:
            return f"Error connecting to local AI engine: {str(e)}"

# --- Execution Loop ---
if __name__ == "__main__":
    agent = AirGappedReporter()

    # Define your local image asset (Make sure to place a real image file in this folder!)
    my_photo = "field_sample.jpg" 

    directive = (
        "Perform an environmental audit of this image asset. Highlight any structural anomalies, "
        "assess immediate stability risks, and provide actionable next steps."
    )

    print("Processing assets locally with Gemma 4 12B...")
    final_report = agent.generate_field_report(
        prompt=directive, 
        image_path=my_photo
    )

    print("\n" + "="*20 + " GENERATED REPORT " + "="*20 + "\n")
    print(final_report)

Place an image named field_sample.jpg in the same directory as your script and run it:

python reporter_agent.py

The Horizon is Local

The era of choosing between intelligence and privacy is officially over. By packing high-tier multimodal reasoning into a compressed, laptop-friendly footprint, Gemma 4 proves that the future of developer innovation doesn't require a massive cloud budget or an internet connection.

What's Next?

Now that you have your baseline local engine running via Ollama, here are three ways to level up your agent:

Build a Graphical Interface: Wrap this script into a clean interactive dashboard using Python's gradio library, allowing you to drag-and-drop images directly from your web browser window.
Give Your Agent Local Tools: Implement local function calling so your offline reporter can automatically write markdown outputs to a specific local folder or parse system system telemetry logs without internet connections.
Advanced Territory - Unlock Native Audio: Gemma 4 12B contains structural support to process raw audio arrays directly. While standard local wrappers like Ollama focus heavily on image+text pipelines today, advanced developers look to compile raw source files via specialized C++ engines like llama.cpp to stream audio waveforms straight into the model's unified layers.

Follow for more content!

Top comments (1)

Michael Salinas • Jul 9

I am Python + AI- Augmented Software Engineer.
Your experiences were very interesting in DEV Community.
Not everyone can seize opportunities and thus earn more profit. Furthermore, everyone generates profit by filling in the gaps of others.

I also have a weakness that you need to fill in for me.

But while that might be difficult for me, it could be too easy for you.

If you want to seize a realistic golden opportunity together, let's discuss it in detail.

If interested, contact me anytime.

Best.