Before You Send Logs to Gemini — Strip the PII First

#ai #gemini #rust #security

All tests run on an 8-year-old MacBook Air.

Android logs contain more than stack traces.

User IDs. Email addresses. IP addresses. Phone numbers. Auth tokens that slipped into a debug log. Device identifiers.

Before you send logcat output to any AI API — including Gemini — strip the sensitive data. Here's the filter I built into HiyokoLogcat.

What logcat actually leaks

Real examples from production apps I've debugged:

D/Network: Connecting to 192.168.1.105:8080
I/Auth: User token: eyJhbGciOiJIUzI1NiJ9...
D/User: Loading profile for user@example.com
I/Device: Serial: R58M123ABCD

None of this should go to an external API. Especially not to a free-tier API where the terms say data may be used for training.

The filter

A regex pass over each line before it leaves the device:

use regex::Regex;
use once_cell::sync::Lazy;

static IP_RE: Lazy = Lazy::new(||
    Regex::new(r"\b(?:\d{1,3}\.){3}\d{1,3}\b").unwrap()
);
static EMAIL_RE: Lazy = Lazy::new(||
    Regex::new(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b").unwrap()
);
static TOKEN_RE: Lazy = Lazy::new(||
    Regex::new(r"\b[A-Za-z0-9+/]{20,}={0,2}\b").unwrap()
);
static PHONE_RE: Lazy = Lazy::new(||
    Regex::new(r"\b\d{2,4}[-\s]?\d{2,4}[-\s]?\d{4}\b").unwrap()
);

pub fn mask_pii(line: &str) -> String {
    let line = IP_RE.replace_all(line, "[IP]");
    let line = EMAIL_RE.replace_all(&line, "[EMAIL]");
    let line = TOKEN_RE.replace_all(&line, "[TOKEN]");
    let line = PHONE_RE.replace_all(&line, "[PHONE]");
    line.to_string()
}

After masking:

D/Network: Connecting to [IP]:8080
I/Auth: User token: [TOKEN]
D/User: Loading profile for [EMAIL]

The stack trace and error context survive. The sensitive data doesn't reach Gemini.

Tell users what you're doing

Even with masking, users should know their logs are being sent externally. HiyokoLogcat shows a disclaimer in settings:

"The free Gemini API may use submitted data for model training. Log lines are automatically masked for common PII before sending, but review your logs before using AI diagnosis on sensitive apps."

Transparency matters. Especially for developer tools where the logs might contain production data.

The token regex caveat

Base64-like strings appear everywhere in logs — not just auth tokens. The token regex will also mask things like encoded image previews, checksums, and random IDs.

That's acceptable. A masked checksum doesn't break the diagnosis. A leaked auth token is a much bigger problem.

When in doubt, mask more.

HiyokoLogcat is free and open source → github.com/hiyoyok/HiyokoLogcat
X → @hiyoyok