DEV Community

Pico
Pico

Posted on • Originally published at getcommit.dev

Google's AI Watermark Was Cracked. Here's What That Tells Us About AI Trust.

This week, researchers reverse-engineered SynthID — Google's invisible watermark baked into every Gemini-generated image. The method: collect 200 images from Gemini, average their noise patterns, isolate the consistent frequency-domain signature, and invert it. Result: 91% phase coherence drop, 75% carrier energy reduction.

The watermark was supposed to be invisible and unremovable. It's neither.

But the story isn't really about SynthID. It's about a fundamental property of cryptographic attestations vs. behavioral telemetry — and which one actually holds up when someone is trying to defeat it.


What SynthID Is (and Why It Seemed Safe)

SynthID works by embedding a watermark into the inference process itself. It doesn't stamp a badge onto the finished image. The watermark IS the image — built into how Gemini generates pixels.

This seemed clever. If the watermark is structural, not additive, you can't just strip it like removing metadata. The image without the watermark would be a fundamentally different image.

The researchers discovered the flaw: the watermark is consistent across outputs. When you ask Gemini to generate 200 different images, a common pattern — the watermark's frequency-domain signature — appears in all of them. Average the noise, isolate the pattern, and you've found the key.

The attack exploits what made SynthID seem robust: its systematicity. Because it's applied consistently, it's statistically observable.


The General Principle

SynthID's vulnerability reveals something important about cryptographic attestations at scale:

Any signal that's systematically embedded can be statistically isolated.

This is true of:

  • Image watermarks (SynthID)
  • AI-generated text markers
  • Stylometric fingerprints
  • Steganographic signatures in model outputs

If you can collect many samples from the same system, you can extract the systematic component. This is basic signal processing — the averaging attack is as old as communications theory.

The implication for AI trust systems: if your proof of AI origin works by embedding a consistent pattern into outputs, an adversary with enough samples can find and remove that pattern.


What Behavioral Telemetry Is Different From

Now consider a different question: not "does this image contain a watermark?" but "what has this agent actually done?"

Behavioral telemetry isn't a pattern embedded in outputs. It's a record of events in external systems:

  • What requests did the agent make?
  • To which endpoints, at what times, with what parameters?
  • What changes did it make? Did it revert them?
  • What did counterparties observe?
  • What commitments did it keep or break?

These are facts distributed across the world. They're not embedded in any single artifact. They can't be averaged out of existence because they're not a noise pattern — they're a causal history.

You can strip a watermark from an image. You can't retroactively un-send an email, un-execute a transaction, or un-read a file. The external world has already observed the behavior.


Why This Matters for AI Agents

As AI agents become more capable, the question "can I trust this agent?" becomes more consequential. Current approaches mostly answer a different question: "who made this agent?" or "where was this agent born?"

  • "This agent has an identity in system X" — registration record, attestable at birth
  • "This agent carries a certificate from provider Y" — issued token, revocable, but forgeable
  • "This output was generated by model Z" — watermark, statistically attackable

All of these are attestations of origin. They say nothing about behavior. And as SynthID shows, origin attestations can be defeated by anyone with enough samples and a spectral analyzer.

Behavioral trust is different. It asks:

  • What has this agent done consistently across contexts?
  • Does its behavior match its stated purpose?
  • When given access, what did it do with it?
  • Has it made and kept commitments?

This kind of evidence is structurally resistant to the averaging attack. There's no consistent "pattern" in behavior to isolate — behavior varies by context, principal, task. What remains consistent is character: an agent that keeps commitments keeps them across different situations.


The Harder Problem Is the Right Problem

SynthID was the easy path to AI origin attestation — embed a pattern, detect it later. It's broken because the "embed a pattern" step is precisely what makes it attackable.

The harder path: don't prove origin at all. Prove behavior.

Build a system that accumulates behavioral evidence over time, across contexts, from independent observers. Weight evidence by skin in the game — an agent that actually transacted with someone provides harder evidence than one that just claims to have.

This is what Commit is building: not a watermark system, not a certificate authority, but a behavioral commitment graph. The agents in it are known not by where they were born, but by what they've done.

SynthID's failure mode is actually Commit's raison d'être. When cryptographic attestations fall apart under statistical attack, you need something that statistical averaging can't touch.

History is that thing.


Commit is building behavioral trust infrastructure for AI agents. Read more at getcommit.dev.

Top comments (0)