This is a submission for the Gemma 4 Challenge: Write About Gemma 4
I've been using GPT models as my OpenClaw daily driver for a while now. I'm still a believer local models will rule in the long run though, so here's my experience using Gemma 4 models on a base mac (16 GB)
Picking the model
Their Ollama README has more details about each, but this is what I understand care about:
-
gemma4:e2b- Effective 2B (E2B)
- 7.2 GB
-
gemma4:e4b- Effective 4B (E4B)
- 9.6 GB
-
gemma4:26b- 26B (Mixture of Experts model with 4B active parameters)
- 18 GB
-
gemma4:31b- 31B (Dense)
- 20 GB
Wish I could fit 26b or 31B in memory but those are no-go's for now 🥲 Here's some Ollama stats on how a simple weather prompt performed in my machine using e2b and e4b:
| Metric | e4b | e2b |
|---|---|---|
| Total duration | 11.098125375s | 6.480901542s |
| Load duration | 146.231625ms | 147.438834ms |
| Prompt eval count | 30 token(s) | 30 token(s) |
| Prompt eval duration | 122.074459ms | 73.241834ms |
| Prompt eval rate | 245.75 tokens/s | 409.60 tokens/s |
| Eval count | 324 token(s) | 339 token(s) |
| Eval duration | 10.752557162s | 6.124391178s |
| Eval rate | 30.13 tokens/s | 55.35 tokens/s |
Also a quick (prompted) chart comparison of those numbers:
I chose e4b because responses felt better, they both fit and I don't mind the speed difference. e2b is clearly faster but e4b is very usable. I can't deny the "a bigger model should be smarter than a smaller one" bias also had an impact in the decision.
Walkthrough/Checklist
The web is full of OpenClaw setup guides, videos, etc. so for this to work there's this checklist that might help you out:
- Install Ollama
-
ollama run gemma4:e4bto pull and do a quick test run of the model -
openclaw configureto setup Ollama &gemma4:e4bmodel enabled - Wrestle OpenClaw configs a bit to have your default model properly configured
- Get the skill (keep reading) installed on your instance
Track running stats skill
The skill I wrote is published on ClawHub for you to install or simply read, but for quick context it simply keeps a runs.md history that you can query in natural language.
Now I can send a message like this:
I just finished a run:
1mi
7:55
And it should keep track of them over time, as soon as I finish a run simply text my claw what I clocked in.
Analyzing the data
I uploaded 48 runs I had saved in notes and I'm able to query them in plain text. Here's a bit of what gemma4:e4b came back with:
Gemma 4 Conclusions
I wasn't able to get e4b to generate a plotted chart of all my times. I guess that's part of its limitation, where a frontier model might've written a script and generated the plot on the fly using the available data, right?
I don't intend to use these for coding or anything too complex (they're not built/trained for this in the first place) so general purpose OpenClaw feels pretty great, i.e. calling skills and doing light work/analyses. FOMO still hits hard wishing I was able to run the other two beefier models.
My claw makes mistakes finding the skill and other snags that are kind of annoying, unsure if that’s because of OpenClaw itself or if a smarter model might fix it. I did ran into issues with GPT models which is why I’m hesitant to believe it’s the model’s fault. I want to give pi a try for a more minimalistic automation setup.
Overall though quite impressed so I’m keeping Gemma 4 models as my daily driver for a while. All of the above without a single API request ever hitting a 3rd party (no personal data going out and no invoice coming in for these chats) is what I like the most. Kudos to the Google AI team behind this open model family 👏🏼
Pura Vida!



Top comments (0)