Okeke Chukwudubem

Posted on May 12

I Ran an AI Model on My Phone. No Cloud. No API Keys. Just Gemma 4 and Termux.

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Write about Gemma 4 Submission

The first time an AI model runs on your own device, something shifts in your brain.

It's not the speed though seven tokens per second on a phone is respectable. It's not the convenience though never touching an API key again is liberating. It's the quiet realization that the computer in your pocket is no longer just a client. It's a server. It's a peer. And the cloud, for the first time, is optional.

I'm a software engineering student at UNIZIK in Anambra, Nigeria. I build on a cracked iPhone 7 and an aging Android phone. I don't have a GPU. I don't have cloud credits. What I have is a stubborn belief that my location shouldn't determine my access to the most powerful technology of our generation.

On May 6th, 2026, Google's Gemma 4 made that belief feel less like hope and more like a specification sheet.

This is a practical guide to running Gemma 4 locally on a phone. Not a theoretical overview. Not a benchmark comparison. A step-by-step walkthrough of what I did, what worked, what broke, and why it matters for every developer who's ever been locked out by a loading spinner.

Step 1: Pick Your Model

Gemma 4 isn't one model. It's a family of four, spanning from ultra-mobile to workstation-class. For phone deployment, two variants matter.

The E2B, with 2.3 billion effective parameters and a 128K context window, is designed for phones, edge devices, and always-on assistants. The E4B, with 4.5 billion effective parameters and the same 128K context window, is built for phones and laptops, handling offline apps and more complex reasoning.

For this guide, I used E2B. It's the lightest variant that still delivers meaningful performance. If you have a phone with 12GB+ RAM, try E4B.

Step 2: Set Up Your Environment

You need Termux, a terminal emulator and Linux environment for Android that doesn't require root. Download it from F-Droid, not the Play Store the Play Store version is outdated. Once installed, open Termux and run these commands one by one:

pkg update && pkg upgrade
pkg install python git cmake gcc

This installs Python, Git, and essential build tools. Next, install Ollama, the easiest way to run LLMs locally. Ollama officially supports Linux and macOS. On Termux, we'll use a community-maintained approach:

git clone https://github.com/ollama/ollama.git
cd ollama

Follow the Termux-specific instructions in the repository's README. This step requires patience. You're compiling software on a phone. It will take time. Let it run.

Step 3: Pull and Run Gemma 4

Once Ollama is installed, pulling the Gemma 4 model is a single command:

ollama pull gemma4:2b

This downloads the E2B variant. The download is several gigabytes
connect to WiFi, not mobile data. Once pulled, you run it with:

ollama run gemma4:2b

You're now talking to an AI model running entirely on your phone. No internet required after the initial download. No API keys. No per-token billing. No data leaving your device.

I tested it with a simple prompt: "Explain recursion to a beginner using only pidgin English." It responded with a surprisingly clear explanation that mixed pidgin with technical accuracy. It wasn't perfect it stumbled on a complex follow-up about tail recursion but it was running on a phone. My phone. Offline.

Step 4: Expose It to Your Local Network

Here's where it gets interesting. Ollama exposes a local API on port 11434. Other devices on your WiFi network can send requests to your phone's IP address at that port. Your phone becomes a local LLM server.

On your laptop (or another phone), open a browser and navigate to your phone's IP address, port 11434. Send a POST request with a prompt. The model responds. The phone, sitting on your desk, does the computation and returns the answer.

You've just built a private, offline AI server from a phone. No cloud bill. No privacy concerns. No "check your connection" errors.

Step 5: Handle the Limits

Let me be honest about what breaks. First, thermal throttling. Running inference on a phone generates heat. After about 20 minutes of continuous use, my phone got noticeably warm and response times slowed. For production use, you'd need to batch requests and give the device breathing room.

Second, Android's memory management is aggressive. If you switch away from Termux for too long, the OS may kill the Ollama process to free RAM. One developer reported his model survived overnight when he kept the phone plugged in and Termux in the foreground.

Third, speed. I got about 7-8 tokens per second for text generation on an Oppo Find N5 with 16GB of RAM. On a device with 4GB or 6GB, expect 3-5 tokens per second. That's usable for chat, but not for real-time applications.

What This Unlocks

Once you've run a model locally, you start seeing possibilities everywhere. A chatbot for a local business that answers FAQs without internet. An offline document analyzer that reads PDFs and extracts key points. A coding assistant that works during network blackouts. A classroom tool for schools without reliable connectivity.

I'm building Dexter Nova, an AI assistant for small businesses in Anambra. The first version targets Gemma 4 E2B running on a dedicated Android device. No API calls. No subscription. Just a phone on a shelf, answering customer questions 24/7.

One developer, John Fiewor, built GradrAI an assessment platform for African educators. When schools told him "we don't have reliable internet," he ripped out the cloud dependency and replaced it with Gemma 4 E4B running locally on a teacher's laptop. The entire pipeline exam generation, grading, feedback now runs offline.

This is the real promise of Gemma 4. Not a higher benchmark score. Not a flashy demo. The quiet, radical act of making AI work where the cloud cannot reach.

This post is my entry for the DEV.to Gemma 4 Challenge. Think local. Build real.

Top comments (2)

Youdiowei Eteimorde • May 16

This article made me ask myself why haven't I ran ollama on my phone. Great piece, a few suggestions though. Consider adding sub-headings to the article and put all the codes and shell commands in code blocks 🙂

Chioma Maduka • May 13

Amazing!!