Gemma4 based local Mac assistant in Rust

#gemmachallenge #gemma #ai #rust

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

mac-agent-rust is an early-stage local AI agent that controls your Mac through
natural language — running entirely on-device, no cloud, no API keys.

It is under active development. Right now it can:

navigate Safari tabs
Draft and compose mails in the Mail app
Create notes in Apple Notes
Display system notifications
Get and set system volume

You type what you want in plain English. The agent writes the AppleScript, runs it
on your machine, reads the result, and responds. Everything stays local.

Demo

Code

github.com/Tejas1Koli/mac-agent-rust

Built with:

Rust — agent loop, tool execution, history management
Rig — LLM client and tool calling framework based on rust
Ollama — local model inference
osascript — AppleScript execution via macOS native runtime

How I Used Gemma 4

I used Gemma 4 E4B MLX — the 4 billion parameter multimodal variant running via
Ollama's MLX backend, optimized specifically for Apple Silicon. On an M2 MacBook Air
it runs fast with no fan spin, no internet, and no API key — pure on-device inference
using the unified memory architecture.

The agent runs Gemma 4 E4B in a tool-calling loop: receive task → write AppleScript
→ call the tool → read output → respond. The MLX backend gets significantly better
token throughput than llama.cpp on Apple Silicon, which matters for a conversational
agent where latency is noticeable.

E4B was the right fit — big enough to reliably generate valid AppleScript and follow
strict tool-call formatting, small enough to run locally without breaking a sweat on
consumer Mac hardware. macOS automation tasks are narrow and structured, not
open-ended reasoning problems, so you don't need a 31B model to get good results.

Why Rust

Most AI agents are Python. I wanted a single binary, proper error handling around
arbitrary system command execution, and Tokio's async runtime for process timeouts.
Rig made tool-calling feel native — a tool is just a trait, schema mismatches are
caught at compile time. cargo install and it runs, no virtualenv, no runtime.

try

cargo install mac-agent-rust

Limitations

Small models struggle with tool call formatting — Gemma 4 E4B occasionally returns empty responses or malformed tool calls, requiring retries
No automatic retry on error — when a script fails the model tends to report the error to the user instead of fixing and retrying the script
Context degradation — long conversations cause the model to lose track of tool results and contradict correct output it just received
Tool result confusion — the model sometimes receives correct output but claims the script failed, especially when results are wrapped in JSON instead of plain text
Spotify and Music app — track info only works when something is actively playing, errors out when paused with no graceful handling
Safari JavaScript — requires "Allow JavaScript from Apple Events" to be manually enabled in Safari's Developer settings, not obvious to first time users
History resets every session — no persistence between runs, agent has no memory of previous conversations

DEV Community