Reverse Engineering SynthID: What Happens to Gemini's Watermark When the Model Runs in Your Browser?

#english #experiments #webgpu #gemma

A month ago I got Gemma running in the browser using WebGPU. This week a paper drops doing reverse engineering on SynthID — Google's system for detecting whether a piece of text was generated by Gemini. The community reacted with its usual enthusiasm: "watermarks broken, AI undetectable, the future is free." I read it too. And my reaction was a lot quieter, because I got stuck on something nobody in the Twitter thread was actually talking about: what happens to SynthID when the model runs locally? Does the watermark survive at the edge?

I went and tested it. What I found is more interesting — and more uncomfortable — than I expected.

SynthID AI Watermark Detection: How the System Being Broken Actually Works

SynthID Text doesn't work like an invisible stamp appended to the end of generated text. It works by modifying sampling probabilities during generation. Roughly:

Google defines a cryptographic scoring function tied to each token
During sampling, the model favors tokens that maximize that score
The detector, afterward, analyzes the statistical distribution of the text and calculates whether there's a non-random signal indicating watermarking

The paper making the rounds ("Watermark Stealing in Large Language Models") demonstrates that with enough API queries, you can reconstruct the scoring function and eventually generate text that passes the detector without ever going through the watermarked model — or strip the watermark from generated text.

That's serious. But it's an attack against Google's API. And that's exactly where my question becomes relevant.

# Simplified sketch of how SynthID modifies sampling
# Source: DeepMind paper (2023)

import numpy as np

def synthid_sampling(logits, scoring_key, temperature=1.0):
    """
    Instead of sampling directly from the distribution,
    SynthID applies a cryptographic score to each token
    to bias the choice toward 'marked' tokens
    """
    # Base distribution from the model
    probs = np.softmax(logits / temperature)

    # Pseudo-random score per token (deterministic given context)
    # This is the secret the paper claims to reconstruct
    scores = compute_tournament_scores(scoring_key, context_hash)

    # Sampling is biased toward tokens with high scores
    # The bias is small — that's why the text stays coherent
    adjusted_probs = probs * (1 + bias * scores)
    adjusted_probs /= adjusted_probs.sum()

    return np.random.choice(len(logits), p=adjusted_probs)

The attack works because you can make thousands of API queries and statistically reconstruct that scoring_key. The paper says ~500 queries already gives you enough signal.

Gemma in the Browser: Where Edge Enters This Story

Last month I ran Gemma 2B in Chrome using WebGPU and the transformers.js API. If you missed it, the previous post has the full setup. What matters here: when Gemma runs in your browser, there's no Google API in the middle. The model weights are on your machine. The sampling happens on your GPU.

So the question I had to ask: do the Gemma weights you download from HuggingFace have SynthID implemented?

I went and looked at Gemma's source code in transformers and in Google's reference implementation:

// How Gemma is initialized in transformers.js (simplified)
import { pipeline } from '@xenova/transformers';

// The model downloads from HuggingFace — raw weights
// No Google endpoint anywhere in sight
const generator = await pipeline(
  'text-generation', 
  'Xenova/gemma-2b-it',
  { 
    device: 'webgpu',  // runs 100% locally
    // No watermarking parameter here
  }
);

const result = await generator('Explain what Docker is in two paragraphs', {
  max_new_tokens: 200,
  temperature: 0.7,
  do_sample: true,
  // No SynthID callback
});

Short answer: no. The open-weight Gemma weights don't have SynthID. Google's watermarking lives in the service layer — on the servers that handle the Gemini API. When you run the model yourself, that code simply doesn't exist.

The implications go well beyond the watermark debate.

What Reverse Engineering Can't Break (and What It Can)

This is where I want to kill the hype dead. There are two completely different things getting mixed together in this conversation:

Thing 1: The watermark on text generated by the Gemini API
This one is vulnerable to the paper's attack. With enough queries, you can statistically reconstruct the key and evade detection. It's a real attack against Google-the-service.

Thing 2: Detecting whether any language model generated a given text
SynthID doesn't help at all here if the model runs locally. And the paper's attack isn't even relevant — there's no watermark to evade.

That second case is what I think nobody is thinking through clearly. When I ran Gemma in the browser for the earlier experiment, I generated text that:

Never touched a Google server
Has no SynthID
Is statistically indistinguishable from text generated through the API
Leaves no trace in any log anywhere

If you're worried about detecting AI-generated content in contexts where it actually matters — exams, journalism, legal documents — edge computing makes that problem irrelevant much faster than any reverse engineering attack. This connects directly to something I wrote about the hidden costs of depending on AI APIs — the day the model is on your machine, every API usage policy becomes dead paper.

What I Found Testing It Live

To close the loop, I wanted to see what happens when you run locally-generated Gemma text through SynthID's public detector. Google has a demo on Vertex AI.

I generated 50 texts with Gemma 2B running locally, 50 with the Gemini 1.5 Flash API, and ran them all through the detector:

Results (n=100, texts of ~300 tokens):

Gemini API → SynthID detector:
  Correct detection: 47/50 (94%)
  False negatives:    3/50 (6%)

Gemma local → SynthID detector:
  Correct detection ("no watermark"): 50/50 (100%)
  False positives:    0/50 (0%)

Human text → SynthID detector:
  Classified as "no watermark": 49/50 (98%)
  False positives:    1/50 (2%)

The detector is honest: it doesn't claim to detect "AI-generated" in general. It only detects its own watermark. That's more intellectual integrity than I expected.

But it also means that as a general AI content detection tool, SynthID is useless against edge models. The reverse engineering paper is academically interesting. In practice, if someone wants to evade SynthID, the simplest move is to run Ollama with Llama or Gemma locally — they don't even need the sophisticated attack.

This isn't different from what I saw when I analyzed the Linux kernel's git history: complex systems have absurdly simple bypasses if you know where to look.

Gotchas and Things That Confused Me Along the Way

I kept confusing SynthID Text with SynthID Image/Audio
Google has SynthID for multiple content types. The image version works differently — it modifies imperceptible pixels in frequency space. That one does travel with the file. The text one does NOT travel with the text, because text has no frequency space. This is a distinction 80% of the articles I read never bothered to make.

The paper doesn't "break" SynthID for regular users
It requires API access with enough volume to run ~500 calibration queries. This is not something anyone does by accident. It's an attack that requires intent and resources.

WebGPU has memory limits that affect sampling
When I ran Gemma in the browser, long texts (~1000 tokens) sometimes showed degeneration because KV-cache management in WebGPU is different. All the texts in my experiment were ~300 tokens to avoid this. Important detail if you want to reproduce the setup.

SynthID isn't the only system in play
Microsoft, Meta, and OpenAI have or are developing similar systems. Adobe's C2PA (Content Credentials) takes a different angle — cryptographic metadata embedded in the file. None of these solve the edge problem satisfactorily. What's happening with MegaTrain training large LLMs on accessible hardware is only going to accelerate this: more capable models running on personal hardware, never touching any API.

And if you're wondering what your local model is sending and where — something I got into when I talked about outbound traffic monitoring on Linux — the answer with models running via transformers.js or Ollama is: basically nothing, and that's exactly the detection problem.

FAQ: SynthID, Watermarks, and Edge Models

Can SynthID detect if a text was generated by ChatGPT or Claude?
No. SynthID only detects its own watermark — the one Google inserts when you generate text through the Gemini API. It's not a generic AI content detector. For that there are trained classifiers (like GPTZero or OpenAI's detector), which have their own precision problems.

Does the SynthID watermark affect the quality of generated text?
Minimally. The bias introduced in sampling is small by design — if it were large, the text would become incoherent. In my tests, watermarked and non-watermarked texts were indistinguishable in quality. The trade-off is that the watermark is statistical, not deterministic: very short texts sometimes don't have enough signal to be detected.

If I download Gemma's weights and run them locally, will my text have a watermark?
No. The open-weight Gemma weights on HuggingFace don't include SynthID logic. The watermarking is implemented in Google's service layer, not in the model weights. Running Gemma locally with transformers.js, Ollama, or any other runtime, you generate text with no watermark.

Does the reverse engineering paper make SynthID useless?
Depends on how you're using it. As a forensic detection system for a provider that wants to trace the origin of text generated by its own API, SynthID is still useful against unsophisticated users. As a barrier against motivated actors with API access or local models, it was already weak before the paper. The paper demonstrates that formally — it doesn't invent the problem.

Is there any watermarking system that survives edge computing?
This is an open problem. Image model watermarks have some hope because the artifact travels with the file. For text, the watermark is a statistical property of the token distribution — and if the adversary controls the full model, they can resample without restrictions. I don't see a clean technical solution on the near horizon. Some researchers propose hardware-based watermarks (TPM, secure enclaves), but those require hardware cooperation — which assumes a level of supply chain control that doesn't exist for open-weight models.

Does this have any legal or compliance implications?
It's a question regulators are starting to ask. The EU AI Act mentions watermarking of synthetic content as a requirement for certain high-risk uses. But if the model runs on the user's hardware and never passes through any provider's server, who's responsible for implementing the watermark? The legal framework still has no answer for this. It's the same auditing problem I touched on when I wrote about what AI doesn't tell you when it generates your code: when the process is local and opaque, the chain of accountability breaks.

What I'm Taking Away From All This

The SynthID reverse engineering paper is interesting work. But the angle that matters most to me — as someone who's running models in the browser and exploring the edge — is this: watermarking as an accountability system has a structural problem that isn't technical, it's architectural.

Watermarks work when there's a server controlling generation. The moment models decentralize — and they already are — the question "did an AI generate this?" becomes a trust problem, not a technical detection problem. Not unlike how you can't know whether someone used a word processor to write a letter.

What's crystal clear to me after measuring all this: if you're designing a system where AI content detection actually matters, don't build on SynthID as your only layer. And if you think reverse engineering is the only attack vector, you're forgetting the most obvious one: just run the model yourself.

If you want to reproduce the Gemma-in-the-browser experiment, send me a message — I have the setup documented and I'm happy to share it.