The Ethics of "Offline-First" AI: Privacy as a Competitive Advantage

#rust #tauri #ai #privacy

All tests run on an 8-year-old MacBook Air (Intel). When I added AI diagnostics to my logcat viewer, the first question wasn't "which model"—it was "how do I keep user data off someone else's server."

In the age of cloud-everything, privacy has become a luxury. When I built the AI features for HiyokoLogcat and HiyokoPDFVault, I faced a real engineering challenge: how do you give developers the power of LLMs without exposing their sensitive log data? The answer lies in an "Offline-First" architecture with rigorous local masking.

TL;DR

Logcat output contains sensitive data (IPs, tokens, emails) that must be masked before any external API call.
A Rust-based regex masking pipeline runs locally, replacing sensitive patterns with placeholders before data reaches Gemini.
HiyokoPDFVault uses candle-core for fully local semantic search and OCR—no server involved.
Users get explicit, opt-in consent dialogs before any data leaves the device.

The Problem: Logs Are Full of Secrets

Logcat output from an Android device is a debugging goldmine, but it routinely contains data that should never leave the developer's machine:

// Typical logcat output with sensitive data
D/NetworkManager: Connected to 192.168.1.105:5555
D/AuthService: Bearer token: eyJhbGciOiJSUzI1NiIsInR5cCI6Ikp...
I/UserManager: Logged in user: dev@company-internal.com
W/CrashHandler: Stack trace uploaded to https://internal-sentry.corp.net/api/12345

Sending this directly to a cloud AI for analysis is a security incident waiting to happen. IP addresses reveal network topology. Bearer tokens grant access. Email addresses identify individuals.

Local Masking Pipeline

Before a single log line is sent to the Gemini API, HiyokoLogcat runs a Rust-based masking pipeline. The pipeline is fast enough to process thousands of lines per second on my dual-core MacBook Air:

use regex::Regex;
use once_cell::sync::Lazy;

// Pre-compiled regex patterns for performance
static MASKING_RULES: Lazy<Vec<MaskRule>> = Lazy::new(|| {
    vec![
        MaskRule::new(
            r"\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(:\d+)?\b",
            "[IP_MASKED]"
        ),
        MaskRule::new(
            r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
            "[EMAIL_MASKED]"
        ),
        MaskRule::new(
            r"(?i)(bearer|token|api[_-]?key|password|secret)\s*[:=]\s*\S+",
            "[CREDENTIAL_MASKED]"
        ),
        MaskRule::new(
            r"https?://[^\s]+(?:internal|corp|staging|local)[^\s]*",
            "[INTERNAL_URL_MASKED]"
        ),
    ]
});

struct MaskRule {
    pattern: Regex,
    replacement: &'static str,
}

impl MaskRule {
    fn new(pattern: &str, replacement: &'static str) -> Self {
        MaskRule {
            pattern: Regex::new(pattern).expect("Invalid regex pattern"),
            replacement,
        }
    }
}

/// Mask all sensitive patterns in a log line
pub fn mask_sensitive(line: &str) -> String {
    let mut result = line.to_string();
    for rule in MASKING_RULES.iter() {
        result = rule.pattern.replace_all(&result, rule.replacement).to_string();
    }
    result
}

After masking, the same logcat output looks like this:

// Masked output — safe to send to Gemini API
D/NetworkManager: Connected to [IP_MASKED]
D/AuthService: [CREDENTIAL_MASKED]
I/UserManager: Logged in user: [EMAIL_MASKED]
W/CrashHandler: Stack trace uploaded to [INTERNAL_URL_MASKED]

The AI can still diagnose the issue ("NetworkManager connected successfully, then AuthService sent a token, then the user logged in") without ever seeing the actual sensitive values.

Fully Local AI with candle-core

For HiyokoPDFVault, I went further. PDF documents can contain attorney-client privileged material, medical records, or financial data. Sending these to any cloud API—even with masking—is unacceptable for many users.

Instead, HiyokoPDFVault uses candle-core to run a small embedding model locally on the GPU via Metal:

use candle_core::{Device, Tensor};

/// Initialize the local embedding engine on macOS Metal GPU
fn init_local_engine() -> Result<EmbeddingEngine> {
    // Use Metal backend on macOS for GPU acceleration
    let device = Device::new_metal(0)
        .unwrap_or_else(|_| Device::Cpu);  // fallback to CPU

    // Load a compact embedding model (~50MB)
    // Small enough for an 8-year-old MacBook Air
    let model = load_embedding_model(&device)?;

    Ok(EmbeddingEngine { model, device })
}

/// Generate embeddings for semantic search — fully local
fn embed_text(engine: &EmbeddingEngine, text: &str) -> Result<Vec<f32>> {
    let tokens = tokenize(text)?;
    let input = Tensor::new(&tokens[..], &engine.device)?;
    let embeddings = engine.model.forward(&input)?;

    // Normalize for cosine similarity search
    let normalized = l2_normalize(&embeddings)?;
    Ok(normalized.to_vec1()?)
}

The actual implementation handles additional edge cases not shown here.

The result: users can search their PDF library semantically ("find contracts mentioning penalty clauses") without a single byte leaving their machine. On my MacBook Air's integrated GPU, this runs at about 200ms per query—not instant, but fast enough for interactive use.

Explicit User Consent

Privacy shouldn't be a black box. When HiyokoLogcat or HiyokoBar is about to call the Gemini API, the user sees exactly what's happening:

A dialog shows the masked log content that will be sent.
The user can review, edit, or cancel before any network request fires.
The API key is stored in the macOS Keychain, never in a plain config file.

This opt-in model means the default state is "nothing leaves your machine." External AI is an explicit action, not an ambient feature.

Privacy isn't a checkbox—it's an architecture. By building masking, local inference, and explicit consent into the foundation, "Offline-First" AI becomes a genuine competitive advantage, not a marketing claim.

How do you handle sensitive data when integrating AI into developer tools? Is local masking enough, or do you avoid cloud AI entirely?

If this was helpful, check out HiyokoPDFVault — PDF tools with AES-256 encryption, OCR, watermark, merge/split/compress, Bates numbering.

Built with Rust + Tauri v2. Tested on an 8-year-old MacBook Air.