1red2black ☄️🧙‍♂️🚀

Posted on Feb 5

When a Model Finds a Bug in Cryptography, and a Cryptographer Learns New Mathematics from It

#ai #machinelearning #llm #research

This essay is an answer to the critic who demands: "Stop telling fairy tales about AI helping science. Show me the receipts." Fair enough. Without receipts, stories of AI's triumphant triumphs sound like cult literature.

In February 2026, Google posted a 151-page preprint on arXiv. Fifty authors from Carnegie Mellon, Harvard, MIT, EPFL, and a dozen other institutions. The title is modest: "Accelerating Scientific Research with Gemini: Case Studies and Common Techniques." Modest title. Immodest content.

Preprints about AI capabilities appear daily. Most are benchmarks: the model scored 94.7% instead of last year's 93.2%, please clap. This document is different. Real researchers describe how they spent months battering against an open problem, then fed it to Gemini Deep Think—and received, as if by some conjuring trick, a solution. Or a counterexample. Or a pointer to a theorem from an entirely different branch of mathematics they had never encountered.

Some of these stories deserve telling.

Cryptography has its own Holy Grail: constructing a SNARG from standard assumptions.

SNARG stands for Succinct Non-interactive ARGument. It lets you prove that a computation was performed correctly, where the proof's size and verification time are exponentially smaller than the computation itself. You submit a transaction; the blockchain receives a tiny certificate of purity. Without SNARGs (or rather, their close relatives, zk-SNARKs), there would be no Zero-Knowledge rollups, no meaningful Ethereum scaling. This is critical infrastructure.

The problem: all working constructions rely either on idealized models like the random oracle, or on assumptions cryptographers call "unfalsifiable." Building your house on sand is unpleasant. You want bedrock.

In autumn 2025, a preprint appeared on Cryptology ePrint: Guan & Yogev: SNARG for all of NP, built solely on LWE. LWE—Learning With Errors—is a standard assumption from lattice-based cryptography, the foundation of all post-quantum security. If the construction worked, it would be like finding the philosopher's stone.

Google researchers decided to unleash Gemini on the paper.

But not with a naive "verify this proof"—such prompts yield superficial results. The model tends to praise its master, compliment the structure of his glorious scientific work, and catch typos with variable success. To fight these effects, they used a five-step adversarial self-correction protocol: the model generates a review, then criticizes its own findings for hallucinations, refines the arguments, criticizes again, produces a final version.

This algorithm resembles my Discovery Prompt, newer versions of which I post on my Telegram channel 1red2black. The main difference: they didn't try to cram everything into one message and exploit thinking-mode effects. They ran the phases honestly, as separate prompts.

The model found a hole.

In Definition 4.1 (I cite section numbers in case you want to read the actual research)—the authors required perfect consistency: if two proofs agree on some "local view," their "shadows" (compressed representations) must be identical for all values of the randomness parameter. In the construction from Section 4.3, they achieved only statistical consistency: shadows agree with high probability, but "bad" values exist where they don't.

The difference seems like a technical quibble. For most practical applications, everything already works. But the entire security proof relied on the strong version. The weak version allows an attacker to enumerate randomness values, find a specific "bad" one—and break the whole thing.

The finding was sent to independent experts—Aayush Jain and Zhengzhong Jin. They confirmed: the model was right. The original preprint's authors acknowledged the error and updated the paper on ePrint with a red banner at the top: "A gap has been found in the proof of the main theorem."

A neural network found a fatal bug in a cryptographic paper that human expert reviewers had missed.

Karthik C.S. from Rutgers University works in computational geometry. He was interested in a conjecture about Steiner trees.

A Steiner tree is a minimal tree connecting given points in space. Unlike a minimum spanning tree, you're allowed to add intermediate points (Steiner points), which can reduce total length. The problem is NP-hard, but approximation algorithms exist.

The conjecture that interested Karthik: among all graphs with m edges embedded in Euclidean space in a certain way, the minimum Steiner tree cost is achieved by the star graph. Proving this conjecture would be a step toward understanding the complexity of high-dimensional problems. Years of attempts had produced nothing.

Karthik asked a colleague to formulate a prompt and upload the paper to Gemini. The model proposed two approaches.

The first and most obvious: local graph transformations, step by step approaching the star, without increasing Steiner tree cost. The researchers had already tried this. Dead end.

The second approach was based on Kirszbraun's theorem.

Kirszbraun's theorem—a result from functional analysis, vintage 1934. It states: if you have a Lipschitz function between subsets of Hilbert spaces, you can extend it to the entire space while preserving the Lipschitz constant.

Sounds abstract. The meaning is simple: a "contracting" map between parts of spaces can be extended to a "contracting" map between whole spaces.

Karthik knew about various extension theorems—he had worked with fixed-point theorems in communication complexity, a branch of theoretical computer science studying information quantity. But the connection between Kirszbraun and Steiner trees? He had never seen it. To his knowledge, no one had.

Then came a fork typical of these stories. Initially, the model rejected its own approach as too fancy. Something in its training apparently favored elementary proofs over heavy machinery. A reasonable heuristic, a way to save datacenter compute. But in this case—a false trail.

Karthik clarified: "I don't need an elementary proof."

The model pivoted. It formalized the available arguments. Built a mapping from any graph to a star graph. Showed this mapping was 1-Lipschitz (doesn't increase distances). Applied Kirszbraun's theorem to extend it to Steiner points. Concluded that the star's tree cost cannot exceed the original graph's.

Conjecture proved.

Let me give the mathematician's own words, to avoid substitution of concepts:

"Through this process, I have learned about the power of the Kirszbraun Extension Theorem for Steiner tree computation and analysis," Karthik writes in his testimonial. "To the best of my knowledge, this is a new connection."

An expert in computational geometry learned new mathematics from a language model.

Physicists in Michael Brenner's group at Harvard were working on an integral related to the spectrum of cosmic strings.

Cosmic strings are hypothetical one-dimensional topological defects that may have formed during phase transitions in the early universe. Interest in them surged after Pulsar Timing Array observations detected a stochastic gravitational-wave background. The source might be cosmic strings.

The integral describing loop formation had resisted decades of theoretical effort. Researchers couldn't even nail down the asymptotic behavior of a key coefficient.

The model produced an explicit analytic formula. Previously unknown.

Verification took several paths. Numerical comparison with existing simulation data: the formula matched. Symbolic verification by the original expert: everything checked out. Derivation published with a citation to Gemini as co-author.

A language model derived a formula in theoretical physics that humans had sought for decades.

Another case involved submodular optimization—a field at the intersection of combinatorics, economics, and machine learning. Submodular functions model diminishing returns: each additional element contributes less than the previous one. Classic application: optimal placement of sensors, where each new sensor adds less coverage.

A research team had a paper with several conjectures about online submodular welfare maximization. One involved a probabilistic inequality—a bound on expected marginal gains.

Then came a genuine zero-shot. One prompt. No dialogue.

The model chose exactly this conjecture (not the most obvious in the paper!). Built a counterexample: 3 elements, 2 agents, specific submodular functions (a table of values on all subsets). Checked all 3! = 6 permutations. Computed left and right sides of the inequality: 122.6/6 > 121.8/6.

Conjecture refuted.

Human researchers independently verified the arithmetic. Everything added up.

The document's authors formulate something like a toolkit for working with AI in theoretical research. I'll paraphrase.

Iterative refinement. The model rarely solves a problem on the first try. Success comes through dialogue: refining the formulation, pointing out errors, providing scaffolding—high-level structure for the model to fill with details.

Cross-pollination. Models have digested literature from all fields. They find connections that experts miss because each human expert is trapped in their narrow expertise. Weierstrass-Stone for Max-Cut (functional analysis → approximation algorithms). Kirszbraun for Steiner (topology → computational geometry). Bethe approximation for permanents (statistical physics → graph theory).

Context de-identification. Sometimes the model refuses to attack a problem it recognizes as an "open problem." The counterintuitive solution: strip the context. Remove all information about the open problem's history. Leave only the statement and definitions. Less context, better results.

Neuro-symbolic loops. The model proposes a formula; code verifies; errors return to context. Automatic pruning of dead branches without human involvement.

Adversarial self-correction. For review: generation → self-criticism for hallucinations → refinement → repeated criticism → final version.

The authors are honest about limitations.

Confirmation bias. If you formulate a false conjecture as true and ask for a proof, the model will try to close all logical gaps with confident, handwavy arguments. A neutral prompt ("prove or disprove") helps, but guarantees nothing.

Confident hallucinations. Models handle high-level structure well but can forget constraints, confuse inequality signs, misapply theorems. In the Courtade-Kumar case (information theory), the model repeatedly confused bounds in hypercontractivity inequalities. Human verification is mandatory.

Alignment friction. Safety constraints often obstruct research. The model refuses to tackle a problem it recognizes as "open" or "too ambitious." You have to strip context or rephrase.

There's an observation the authors make near the end that deserves separate attention.

If AI radically reduces the suffering involved in producing technically dense papers, and such papers now flood out—the bottleneck of science shifts from creation to verification.

Peer review is already overloaded. Reviewers work for free. Deadlines burn. A torrent of AI-assisted literature will break an already barely functioning process.

But our cryptography example shows: AI with properly configured prompts, processes, protocols—can find barely visible problems even in proofs by prominent experts. The same tools can be used to review work from other fields.

But who verifies the verifiers?

And the next question: if a model writes a paper and another model reviews it, where in this cycle is the human? Do we even need a human?

Let's address the elephant. The document was written by Google employees about the capabilities of a Google model. The conflict of interest is obvious.

The research uses a special non-public, advanced version of Gemini Deep Think, unavailable outside Google. Reproducibility with ordinary tools is a big question mark.

The paper describes successes. How many failures were there? What's the success rate? One breakthrough per hundred prompts, or ten? Unknown.

Where does "writing a paper with AI help" end and "writing a paper as a human" begin? In Karthik's case, the human rephrased the prompt to make the model work better. Is the good result his contribution, or the model's? The boundary is blurred.

One researcher describes the model as "a tireless, educated, creative, and gifted junior colleague." This is probably more accurate than grand claims about "reasoning ability" or "discovery."

A junior colleague who never sleeps, has read all the literature, and finds non-obvious connections between fields. Who sometimes hallucinates, but sometimes brilliantly guesses. Who, unfortunately, must be checked at every step. Who cannot be trusted, but can be worked with.

Lance Fortnow—author of "The Golden Ticket," one of the most recognizable names in complexity theory—"vibe-coded" an entire research paper in eight prompts. It felt wrong, he wrote, like he had cheated somehow. But perhaps the difference is that Fortnow understands and is conscious of what he's doing. The model is not. Not yet.

Maybe this is the boundary. Here runs the line between "exceptional junior" and something greater. Between a tool that finds Kirszbraun's theorem at the right moment, and a being that understands why it was needed there.

Or maybe in ten years we'll laugh at this distinction, as we laugh at 1980s fears that computers would take programmers' jobs.

Of course, computers did take jobs from the women who operated punch cards. But the number of programmers only grew.

arXiv:2602.03837v1, Woodruff et al., "Accelerating Scientific Research with Gemini: Case Studies and Common Techniques," February 2026

DEV Community

When a Model Finds a Bug in Cryptography, and a Cryptographer Learns New Mathematics from It

Top comments (0)