Mari

Posted on May 17

I Ran Google's Gemma 4 on a 10-Year-Old Laptop With No GPU. Here Is What I Found.

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Write about Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I knew local LLMs existed, but I never knew they were capable of doing high computational work. A few days ago, I installed Google's Gemma 4 E2B on a 10-year-old HP EliteBook, Intel i7 from 2016, 16 GB of RAM, no GPU, just integrated graphics.
I honestly didn't know what to expect. I was totally blown away. I thought it was going to be very slow, but it wasn't. I also thought it was going to give me generic responses, but I was completely wrong.
This is what I found.

The Setup

Google's Gemma 4 was released in April 2026. For those that aren't aware of it, Gemma 4 is a family of open-weight models (open-weight simply means anyone can download the model and run it themselves, no internet needed once it's installed). It has four models: the E2B and E4B (designed mainly for phones and laptops with low computational power), and the 26B and 31B (for high-end systems and GPUs). I used E2B because that's what my system could handle, a 7.2 GB model file.
This runs on Ollama, a tool for running AI models on a computer.
My setup: Ubuntu 24.04, Ollama 0.23.3, Gemma 4 E2B.

The Install

Before I show you what Gemma 4 can do, I should be honest about what it took to get it running. It was not quick.
I started with the standard Ollama install command. The download got to 98.9% and then died with a protocol error. My connection is not the most stable,and the install script could not handle the dropouts. I saw a lot of "Could not resolve host: github.com" messages along the way.
The fix was to download the install script separately first, with retries built in:

curl -L --retry 5 --retry-delay 3 -o install.sh https://ollama.com/install.sh
sh install.sh

The --retry flag tells curl to keep trying when the connection drops. That part worked. Ollama was installed eventually, after about two hours of patience.
Then I pulled the model:

ollama pull gemma4:e2b

That was another two hours for 7.2 GB. Total time from clicking the first command to my first real conversation with Gemma: about four hours.
Here is the part that makes the four hours worth it. After this, no internet is ever needed again. The model lives on my disk. No API key, no subscription, no monthly bill. Pay the install cost once, use it forever.

The First Conversations

I started with a simple question: how does Gemma verify information?
The answer surprised me. Gemma did not pretend to be a search engine or a fact-checker. It admitted plainly that it does not verify anything. It explained that it is a pattern-matching engine that predicts likely sequences of words based on its training data, and it used the word "hallucination" to describe its main limitation.
In its own words:

I am an extremely sophisticated pattern-matching engine. I generate information based on the knowledge I absorbed during training, not by verifying facts against an external database. Therefore, it is crucial to treat my output as a starting point or a resource, and always verify critical information using reliable, external sources.

That kind of honesty from a small local model was not what I expected.
One thing I noticed quickly: Gemma's default style is very structured. Lots of headers, lots of bullet points, lots of bold. For some tasks, this is great. For casual conversation, it feels like the model is filing a report. Worth knowing if you plan to use it.

Code, the Real Test

This is the part that mattered to me as a developer. I asked Gemma to write a Python function that takes a list of integers and returns the second-largest unique value, handling edge cases.
The response was better than I expected. Gemma first showed its reasoning, working through edge cases before writing a single line of code: empty list, duplicates, negatives, and all elements the same. Then it produced this:

`from typing import List, Optional

def find_second_largest_unique(numbers: List[int]) -> Optional[int]:
"""
Finds the second largest unique value in a list of integers.
"""
if not numbers:
return None
unique_numbers = set(numbers)
sorted_unique = sorted(list(unique_numbers), reverse=True)
if len(sorted_unique) < 2:
return None
return sorted_unique[1]`

Proper type hints. A real docstring. Comments explaining the choices. After the function, it added six test cases covering every edge case I would have written, and finished with a complexity analysis: O(N log N) time, O(N) space.
I would accept this in a code review. From a developer working under me, this is solid junior-to-mid-level work. From a 7.2 GB file running on a 2016 laptop with no GPU, it is something else.
The timing on this response: about 5 to 15 seconds before the first word appeared, then 30 to 60 seconds for the full response, including the thinking trace, the code, the tests, and the complexity analysis. Fast enough to read along as it is typed. I never sat there waiting.

The Long Context Test

Gemma 4 advertises a 128K token context window, which is roughly enough to hold a short novel. I wanted to see how that holds up on real hardware.
I downloaded "The Yellow Wallpaper" by Charlotte Perkins Gilman from Project Gutenberg, about 9,000 words, and asked Gemma to summarize it in three sentences.
The first attempt took 10 minutes, and the summary it gave me was not about the story. It was about copyright and Project Gutenberg's terms of use. The boilerplate header at the top of the file had taken over.
I stripped out the license text and tried again with just the story, about 6,000 words. This time it took 5 minutes, and the summary was real:

The narrator, confined by oppressive circumstances, engages in a desperate act of rebellion by tearing down or removing a portion of the wallpaper, which reveals a hidden reality. This act leads to a further exploration of the narrator's confinement, paranoia, and a climactic confrontation with the perceived prison. Ultimately, the story ends with the narrator's escape, emphasizing a violent break from her oppressive environment."

A literature student would accept this.
So the long context works. But two things matter. First, on CPU-only hardware, every thousand words of input has a real-time cost. Second, what you put in matters as much as how much you put in. Feed it boilerplate, and you get a summary of boilerplate.

Where It Broke: The Multilingual Claim

The Gemma 4 documentation says the model supports more than 140 languages. I speak Nigerian Pidgin and Igbo, so I tested both.
I asked Gemma to translate this sentence into Nigerian Pidgin: "I'm going to the market to buy some food because there's nothing in the house."
Gemma's best option:

I go market to buy food because there na nothing for the house.

How a Nigerian actually says it:

I wan go market go buy food, notin dey house.

The difference is not small. Gemma used real Pidgin words but glued them together with English grammar. It kept "to buy" instead of the Pidgin serial verb "go buy." It used "because" as a connector when Pidgin would just lean on the comma. The result is recognizable to a Pidgin speaker, but it is not how anyone actually talks.
Igbo was worse. I asked for the same sentence in Igbo, and Gemma gave me this:

A na-ekwu n'ịde maka ịzụ ụfọdụ n'aka n'ụlọ, ka ọ bụla nọ n'ụlọ.

The correct version:

Anam aga ahịa igote nri maka na onweghị ihe dị na ụlọ.

The Gemma version is not just unnatural. It uses words that do not mean what the model thought they meant. In its own thinking trace, Gemma defined "ahụ" as "I," when it actually means "body." It defined "nwere" as "to buy," when it really means "to have." The model was confidently making up definitions and then using them to translate.

Supports 140 languages" is doing a lot of work in that marketing line. For low-resource African languages, my honest finding is that the model does not just struggle. It fabricates.

Where It Worked: The Image Test

To balance the multilingual finding, I tried something Gemma 4 is supposed to be strong at: looking at images.
I downloaded a photo of a mountain landscape at golden hour, with peaks rising above a sea of clouds. I asked Gemma to describe what she saw.
The processing took about 8 minutes, which is slow but expected on a CPU. The description was not slow:

This is a breathtaking, panoramic photograph that captures a dramatic mountain landscape seen from a high vantage point, likely during sunrise or sunset. The image is defined by a striking contrast between the dark, rugged foreground and a vast, ethereal sea of clouds below.

It went on to describe the three layers of the composition, the snow on the peaks, the warm tones on the high mountains versus the cool blues higher in the sky, and the dark scrubby terrain in the foreground. All of it was accurate. A blind person reading that description could form a real picture of the actual photo.
So the multimodal feature works, and it works well. It is just slow on a laptop like mine.

The Honest Moment

One thing I want to admit before I wrap up. While I was running the math test, I gave Gemma a word problem about trains traveling between two cities.

The answer came back referencing the cities I had typed. I looked at the output and at first thought Gemma had swapped my cities for American ones. I was about to write a section about training data bias.
Then I looked at my own prompt. I had typed the American cities myself and forgotten. Gemma was just using what I gave her.
The reason I am including this: when something seems wrong with AI, it is worth checking your own input first. A lot of "AI is bad at X" takes are quietly user error.

What This Means

Eight years ago, running a useful language model on a laptop without a GPU was a fantasy. Two years ago, it was possible, but the output was bad enough that you would only do it as a hobby. Now I am running a 7.2 GB file on a 2016 business laptop, and it is writing code I would accept from a junior engineer, summarizing 6,000-word documents, and describing photographs with the eye of a writer.
It is not perfect. The long context is slow on the CPU. The multilingual support breaks for low-resource languages. The image processing takes minutes, not seconds. But "not perfect" is a different conversation from "not useful."
The gap between cloud AI and what runs on ordinary hardware has closed faster than I realized. For a developer without an API budget, without reliable internet, or without a workstation, that gap-closing is not just a technical detail. It is the difference between using these tools and not.

And because Gemma 4 is released under the Apache 2.0 license, anything I build with it is mine to ship. No usage fees, no rate limits, no terms of service that change next year.

How to Try It Yourself

If you have a laptop with at least 8 GB of RAM and a few hours of patience, you can try this today. Three commands:

curl -fsSL https://ollama.com/install.sh | sh ollama pull gemma4:e2b ollama run gemma4:e2b

The first installs Ollama. The second pulls the model. The third starts a chat.
If your connection is flaky like mine, swap the first command for the retry-friendly version I used:

curl -L --retry 5 --retry-delay 3 -o install.sh https://ollama.com/install.sh sh install.sh

Then ask it something hard. See what surprises you.

If you try it, I'd love to hear what you build with it. Drop a comment with what worked, what didn't, or what surprised you. The local AI conversation is just getting started.
Then ask it something hard. See what surprises you.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.