Nitin Kalra

Posted on May 24

I Used Gemma 4 as a Private Log Analyst for App Crashes

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Write about Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Most AI debugging workflows are still on demand.

A year ago, that usually meant copying a stack trace, pasting it into a large model, and asking what went wrong.

Now the workflow is better. I can ask Codex, Claude Code, or another coding agent to inspect the repo, read the relevant files, explain the failure, and even make the fix.

But the first step is still mostly the same:

Something breaks. I notice the stack trace, Gradle error, crash line, or suspicious log message. Then I bring that evidence to the agent and ask it to start from there.

That works, but it is still reactive.

The problem is that this is not how crashes actually happen during development.

Crashes happen while the app is running.

Gradle failures happen while I am switching branches.

Warnings pile up before the actual failure.

Sometimes the real issue is not the red stack trace. It is a swallowed exception inside a try/catch block, a suspicious warning 200 lines earlier, or a lifecycle message that only makes sense when you look at the surrounding code.

So I built around a different idea:

Gemma 4 runs locally on my dev laptop as a continuous log analyst.

Not a replacement for a large model.

Not a chatbot I manually open after everything breaks.

Just a small local model watching logs, clustering noise, catching suspicious patterns, and telling me when something deserves attention.

Why Local Matters For Logs

Logs are messy, repetitive, and often private.

Android logs can contain package names, API paths, device details, feature flags, user identifiers, request IDs, internal class names, and business logic clues. Gradle output can expose project structure, dependency names, local file paths, signing configuration mistakes, and CI environment details.

That makes logs a strange fit for always-on cloud analysis.

For one-off debugging, I am comfortable invoking a stronger hosted model when I choose to. But I do not want every log line from my development machine streamed to a remote API just in case something interesting happens.

This is where Gemma 4 becomes useful in a very specific way:

It can run locally.
It can be cheap enough to call repeatedly.
It can inspect noisy text without needing a perfect prompt every time.
It can sit near the developer loop instead of behind a manual copy/paste step.

That changes the product shape.

A large cloud model is a consultant.

A local Gemma 4 process can be a background reviewer.

Running Gemma 4 Locally With Ollama

For this workflow I used Ollama:

ollama run gemma4:26b

I chose the 26B Mixture-of-Experts model because my laptop has enough headroom for it, and this task benefits from more than tiny-model pattern matching.

The smaller Gemma 4 models are attractive for phones, Raspberry Pi projects, and very low-friction local apps. But a log analyst is doing a slightly different job. It needs to read noisy context, connect earlier warnings to later crashes, decide whether to use tools, and produce structured findings without constantly interrupting me.

That is why I picked the MoE option: the challenge describes it as a highly efficient 26B model designed for high-throughput, advanced reasoning. On my machine, Ollama reports it as gemma4:26b, a 25.8B Q4_K_M local model, about 18GB on disk.

The watcher does not call ollama run directly each time. I keep the model warm, then call Ollama's local HTTP API:

const response = await fetch("http://127.0.0.1:11434/api/chat", {
  method: "POST",
  headers: { "content-type": "application/json" },
  body: JSON.stringify({
    model: "gemma4:26b",
    stream: false,
    messages,
    tools,
    options: {
      temperature: 0.1
    }
  })
});

That keeps the setup simple: local model, local logs, local code, local findings.

What I Built

I built a small local workflow around three inputs:

adb logcat output from an Android app
Gradle build failures
Nearby source files when the log points to a class or line number

The goal was not to make Gemma 4 fix code automatically. The goal was narrower:

Read the logs continuously, ignore obvious noise, identify likely root causes, ask for missing context when needed, and suggest the next debugging step.

The loop looks like this:

adb logcat / Gradle output
        |
        v
small rolling buffer
        |
        v
noise filter + event grouping
        |
        v
Gemma 4 local analysis
        |
        v
structured issue summary
        |--------------------|
        v                    v
bell notification      localhost:3001 viewer
        |
        v
optional source lookup through tools

The important part is the rolling buffer. Instead of sending one isolated stack trace, the local watcher can keep the last few hundred or thousand lines around the failure.

That matters because the most useful clue is often before the crash.

Making The Background Process Noticeable

One problem with a background log analyst is obvious:

If it is running in a separate terminal, how do I notice it found something?

I did not want another terminal tab that quietly fills with text while I am focused on code. So I added two small tools around the model.

The first one is intentionally simple: ring a bell.

When Gemma 4 classifies something as a real finding instead of noise, the watcher can play a short bell sound. Not for every warning. Not for every repeated stack trace. Only when the model thinks the event crosses a threshold:

a new crash signature
a Gradle failure with a clear source location
a swallowed exception that later becomes a visible failure
repeated warnings that are likely connected
a missing file, permission, or dependency that blocks the app

The implementation can be as small as this:

async function ring_bell({ reason }) {
  process.stdout.write("\u0007");

  if (process.env.GEMMA_BELL_CMD) {
    const [cmd, ...args] = process.env.GEMMA_BELL_CMD.split(" ");
    execFile(cmd, args, () => {});
  }

  await appendFile("bell-events.log", `${new Date().toISOString()} ${reason}\n`);
  return { alerted: true, reason };
}

The second tool is a tiny local viewer.

I run it on localhost:3001 and open it in Chrome. It shows recent Gemma 4 findings as a small debugging inbox:

latest issue summary
severity
first seen / last seen time
related log lines
suspected root cause
suggested files to inspect
next debugging step
whether the bell already fired for that issue

That made the workflow much more practical. Gemma 4 can run in the background, but I do not have to keep staring at its terminal output. The bell tells me something needs attention. The browser view gives me the short version when I am ready to look.

Giving Gemma 4 Small Tools

Logs are useful, but they often point to code.

So I gave the local analyzer a small tool surface:

const tools = [
  {
    type: "function",
    function: {
      name: "read_file",
      description: "Read a local source file by path or basename.",
      parameters: {
        type: "object",
        properties: {
          path: { type: "string" },
          startLine: { type: "number" },
          endLine: { type: "number" }
        },
        required: ["path"]
      }
    }
  },
  {
    type: "function",
    function: {
      name: "ring_bell",
      description: "Notify me when a high-confidence finding needs attention.",
      parameters: {
        type: "object",
        properties: {
          reason: { type: "string" }
        },
        required: ["reason"]
      }
    }
  },
  {
    type: "function",
    function: {
      name: "save_finding",
      description: "Save a structured finding for the localhost viewer.",
      parameters: {
        type: "object",
        properties: {
          finding: { type: "object" }
        },
        required: ["finding"]
      }
    }
  }
];

The most important one is read_file.

When a stack trace includes HomeAdapter.kt:88, Gemma 4 can ask for that file instead of guessing what the adapter does. When a Gradle error points to CheckoutViewModel.kt:44, it can read the surrounding code and give a more grounded suggestion.

I kept the tools deliberately boring.

No automatic edits.

No deleting files.

No running random commands.

Just enough local context to move from "this log looks bad" to "read this file and check this assumption."

Example 1: The Stack Trace Was Obvious

The easy case is a normal crash:

FATAL EXCEPTION: main
Process: com.example.notes, PID: 18342
java.lang.IllegalStateException: Fragment NotesFragment not attached to a context.
    at androidx.fragment.app.Fragment.requireContext(Fragment.java:967)
    at com.example.notes.NotesFragment.showEmptyState(NotesFragment.kt:118)
    at com.example.notes.NotesFragment$loadNotes$1.invokeSuspend(NotesFragment.kt:92)

Gemma 4 did not need to be brilliant here. The root cause is already in the stack trace.

The useful output was the structure:

{
  "severity": "crash",
  "likely_root_cause": "NotesFragment calls requireContext() after the fragment is detached.",
  "noisy_lines": ["process restart messages", "unrelated Choreographer skipped-frame warning"],
  "next_step": "Check whether loadNotes() completes after onDestroyView() or after navigation away from the fragment.",
  "code_to_read": [
    "NotesFragment.loadNotes",
    "NotesFragment.showEmptyState",
    "Fragment lifecycle around onDestroyView"
  ]
}

That is already more useful than a raw stack trace sitting in a terminal.

It converts the crash into a short debugging task.

Example 2: The Real Problem Was Hidden Above The Crash

The more interesting case was a noisy sequence like this:

W/ConfigRepository: Failed to parse remote config, using defaults
org.json.JSONException: No value for max_items
    at org.json.JSONObject.get(JSONObject.java:398)
    at com.example.app.ConfigRepository.parse(ConfigRepository.kt:61)
    at com.example.app.ConfigRepository.refresh(ConfigRepository.kt:42)

W/HomeViewModel: Config refresh failed, continuing with cached config

E/RecyclerView: No adapter attached; skipping layout

FATAL EXCEPTION: main
java.lang.IndexOutOfBoundsException: Index 4 out of bounds for length 0
    at com.example.app.HomeAdapter.onBindViewHolder(HomeAdapter.kt:88)

If I only pasted the fatal exception into a model, the answer would probably focus on HomeAdapter.

But the rolling log window shows a better story:

Config parsing failed.
The app swallowed the exception and continued.
The fallback state was empty or malformed.
The adapter crashed later.

That is exactly the kind of issue that a continuous local analyst is better positioned to catch.

Gemma 4's summary was more useful when I asked it to separate trigger, root cause, and visible crash:

{
  "visible_crash": "IndexOutOfBoundsException in HomeAdapter.onBindViewHolder",
  "probable_trigger": "Remote config parsing failed because max_items was missing.",
  "root_cause_hypothesis": "The app continues after ConfigRepository.refresh() fails, but downstream UI code assumes the config produced a non-empty item list.",
  "risk": "The try/catch hides the real failure and converts it into a later UI crash.",
  "suggested_fix": [
    "Return a typed error or safe default from ConfigRepository instead of swallowing the exception.",
    "Make HomeViewModel expose an error/empty state when config parsing fails.",
    "Guard HomeAdapter binding against mismatched item counts."
  ]
}

This is the moment where the local model became more interesting than a manual prompt.

It was not just answering a question.

It was watching enough context to notice that the question I would have asked was incomplete.

Letting It Read Nearby Code

Logs alone are useful, but logs plus a small amount of local source context are much better.

When the analyzer saw ConfigRepository.kt:61, the next step was to read that file locally and include the surrounding function:

class ConfigRepository {
    fun refresh(rawJson: String): AppConfig {
        return try {
            parse(rawJson)
        } catch (e: Exception) {
            Log.w("ConfigRepository", "Config refresh failed, continuing with cached config", e)
            AppConfig.empty()
        }
    }

    private fun parse(rawJson: String): AppConfig {
        val json = JSONObject(rawJson)
        return AppConfig(
            maxItems = json.getInt("max_items"),
            title = json.getString("title")
        )
    }
}

This is where the suggestion became more like code review than log parsing.

Gemma 4 pointed out that AppConfig.empty() was not a neutral fallback. It changed the state shape in a way the UI did not expect. The crash happened later, but the bug was born here.

The suggested improvement was not "catch fewer exceptions" in the abstract. It was more specific:

sealed interface ConfigRefreshResult {
    data class Success(val config: AppConfig) : ConfigRefreshResult
    data class Failed(val reason: String, val fallback: AppConfig?) : ConfigRefreshResult
}

Then the UI layer can make an explicit decision:

show cached config if it exists
show an empty state if there are no items
show an error state if the config is invalid
avoid binding an adapter with impossible assumptions

That is the workflow I care about:

notice the crash
find the earlier suspicious log
read the nearby source
suggest a safer boundary

That is more useful than "here is what IndexOutOfBoundsException means."

Example 3: Gradle Failures Are Mostly Triage

Gradle output is a different kind of problem. It is usually long, repetitive, and full of noise.

For example:

Execution failed for task ':app:compileDebugKotlin'.
> A failure occurred while executing org.jetbrains.kotlin.compilerRunner.GradleCompilerRunnerWithWorkers$GradleKotlinCompilerWorkAction
   > Compilation error. See log for more details

e: file:///Users/me/project/app/src/main/java/com/example/CheckoutViewModel.kt:44:21
Type mismatch: inferred type is String? but String was expected

The useful job for Gemma 4 is not to explain Kotlin nullability from scratch.

The useful job is to reduce the build output to:

Primary failure:
- CheckoutViewModel.kt:44 passes a nullable String into a non-null parameter.

Likely fix:
- Check whether user.email can be null.
- Either validate before calling submitOrder(), provide a fallback, or change the called function to accept String? if null is valid.

Ignore:
- Gradle worker wrapper stack
- Generic "Compilation error" line
- Repeated task execution noise

This is a small thing, but small things matter when they happen 30 times a day.

The Ring Buffer

The watcher does not send my entire terminal history to the model.

It keeps a rolling window:

class RingBuffer {
  constructor(maxLines = 500) {
    this.maxLines = maxLines;
    this.lines = [];
  }

  push(text) {
    for (const line of String(text).split(/\r?\n/)) {
      if (line.trim()) this.lines.push(line);
    }
    if (this.lines.length > this.maxLines) {
      this.lines = this.lines.slice(this.lines.length - this.maxLines);
    }
  }

  snapshot() {
    return this.lines.join("\n");
  }
}

That sounds basic, but it is the key difference from manual copy/paste. The model sees what happened around the crash, not only the final red line.

Why I Would Not Use A Large Model For This First

Large models are better at deep reasoning. I still use them when the failure crosses multiple modules, touches architecture, or needs a careful patch.

But continuous log analysis has different constraints.

It needs to be:

cheap
private
low-friction
always available
good at summarizing repetitive noise
able to run before I know there is a problem

That is the important distinction.

For deep debugging, I use the strongest model I can get.

For continuous debugging, I keep the model close to the logs.

Gemma 4 on a developer laptop fits that second role.

The Prompt Shape That Worked Best

The most reliable prompt was not conversational. It was closer to a small incident-report contract:

You are a local crash-log analyst.

Analyze the log window below.

Return JSON with:
- severity: "ignore" | "warning" | "crash" | "build_failure"
- primary_signal: the line or event that matters most
- likely_root_cause: concise hypothesis
- noisy_lines: log patterns that are probably irrelevant
- missing_context: files, commands, or runtime state needed to confirm
- next_steps: 1 to 3 concrete debugging steps
- source_files_to_read: likely files/classes/functions

Rules:
- Do not invent files.
- If the log is insufficient, say what is missing.
- Separate the visible crash from earlier suspicious events.
- Treat swallowed exceptions as suspicious.
- Prefer practical debugging steps over generic explanations.
- Use read_file only when a log points to a concrete local file.
- Ring the bell only for new or high-confidence findings.

That last rule mattered:

Treat swallowed exceptions as suspicious.

Without that, the model often focused too much on the final crash. With it, the model started paying attention to warning/error logs that appeared earlier but did not crash the app immediately.

The actual loop is a normal tool-calling loop: call Ollama, execute any tool calls, send the tool results back, and stop when the model returns a final JSON finding.

const messages = [
  { role: "system", content: SYSTEM_PROMPT },
  { role: "user", content: buildUserPrompt({ buffer: ringBuffer.snapshot() }) }
];

for (let round = 0; round < 5; round += 1) {
  const response = await ollama.chat({ messages, tools });
  const message = response.message;
  messages.push(message);

  if (!message.tool_calls?.length) {
    return JSON.parse(message.content);
  }

  for (const call of message.tool_calls) {
    const result = await executeTool(call);
    messages.push({
      role: "tool",
      content: JSON.stringify(result)
    });
  }
}

In my test, gemma4:26b used the read_file tool when the prompt asked it to inspect ConfigRepository.kt, then continued from the tool result. That is the part that makes the log watcher feel less like a summarizer and more like a local debugging assistant.

Where Gemma 4 Helped

Gemma 4 was useful for:

compressing noisy logs into a short issue summary
grouping repeated errors
separating primary failures from wrapper stack traces
noticing earlier warnings before a crash
suggesting which source files to inspect
producing structured JSON that the local dashboard renders

The biggest win was not raw intelligence.

The biggest win was presence.

Because the model can run locally, it can be part of the normal dev loop. It does not need me to decide that a log line is important enough to upload somewhere.

Where It Was Not Enough

Gemma 4 was less reliable when:

the log needed domain knowledge that was only in the codebase
the crash involved async ordering across several classes
the real issue depended on backend state
the stack trace pointed to generated code
the logs were too aggressively filtered before reaching the model

The fix was not to pretend the local model could know everything.

The fix was to let it ask for context:

{
  "missing_context": [
    "HomeViewModel.loadHome()",
    "ConfigRepository.refresh()",
    "HomeAdapter.getItemCount()",
    "The JSON payload used for remote config"
  ]
}

That is the right boundary.

Gemma 4 should not hallucinate the code. It should tell the developer or tool which code to read next.

The Workflow I use now

The workflow I use now is simple:

A local watcher tails adb logcat, Gradle output, and test output.
Gemma 4 continuously turns noisy streams into issue candidates.
The tool groups repeated failures instead of spamming me.
New high-confidence findings ring a bell so I actually notice them.
The local viewer on localhost:3001 shows recent findings in Chrome.
When a suspicious event points to code, Gemma 4 uses read_file to inspect the local source.
It suggests one small next step.
If the problem is deep, I escalate the compact summary and selected files to a stronger model.

That gives each model the right job.

Gemma 4 handles continuous private observation.

A bigger model handles expensive reasoning on demand.

My Take

Before trying this, I thought of local models mostly as a privacy story.

That is still true, but it is not the whole story.

Local models also change when we can use AI.

If every model call is expensive, remote, and deliberate, AI becomes something we invoke after we notice a problem.

If a capable enough model is running on the same machine as the logs, AI can start watching for weak signals before the problem becomes obvious.

That is especially useful for mobile development, where crashes are surrounded by lifecycle noise, device noise, framework noise, and build-tool noise.

Gemma 4 does not need to be the smartest debugger in the world to be useful here.

It just needs to be local, cheap, private, and good enough to say:

"This crash is probably not where the bug started. Read the warning 40 lines above it."

That is a small sentence.

But during a debugging session, it can save an hour.

DEV Community