Anup Karanjkar

Posted on May 29 • Originally published at wowhow.cloud

DeepMind AlphaProof Nexus: 9 Erdős Math Problems Solved, AGI 2029

#deepmindalphaproof #aisolves #agi2029 #formalproof

On May 21, 2026, Google DeepMind published an arXiv paper announcing that its AlphaProof Nexus agent had autonomously solved nine open Erdős mathematical problems — some of which had remained unsolved for over 56 years — at a cost of just a few hundred dollars per problem. The same week, at the sidelines of Google I/O 2026, DeepMind CEO Demis Hassabis told the audience that humanity is standing in the "foothills of the singularity" and moved his AGI prediction to "around 2030, plus or minus a year" — now explicitly including 2029 as a real possibility.

These two events are connected. AlphaProof Nexus represents a qualitative leap in what AI can do in formal mathematics — one of the domains historically considered hardest for machines because proofs require not just pattern recognition but rigorous, verifiable logical reasoning. For developers building scientific computing tools, research assistants, or any system that touches mathematical verification, understanding what AlphaProof Nexus is and how it works is now necessary background knowledge.

Who Was Paul Erdős and Why Do His Problems Matter?

Paul Erdős (1913–1996) was a Hungarian mathematician who published more papers than almost any mathematician in history — over 1,500 — and collaborated with hundreds of researchers across combinatorics, number theory, and graph theory. He was famous for offering cash prizes for problems he considered important but could not solve himself, ranging from $25 to $10,000 depending on difficulty.

The "Erdős problems" are a collection of hundreds of open conjectures he posed during his lifetime, spanning combinatorics, prime number theory, graph theory, and discrete geometry. Many appear deceptively simple — the kind of statement a non-mathematician could understand — yet have resisted proof by the best human mathematicians for decades. Solving even one is considered a significant academic achievement. AlphaProof Nexus solved nine in a single research effort.

The paper also includes 44 proofs of conjectures from the OEIS (Online Encyclopedia of Integer Sequences) — the canonical database of integer sequences used in mathematics and computer science. These are not toy problems. OEIS conjectures are listed precisely because human researchers believed they were true but could not prove them.

How AlphaProof Nexus Works: Architecture Deep Dive

AlphaProof Nexus is an agentic system built on two complementary components: the reasoning power of a large language model and the unforgiving rigor of Lean, a formal proof verification language developed at Microsoft Research and now maintained as an open-source project widely used in academic mathematics.

The architecture operates in a closed loop that eliminates hallucination at the proof level:

The Gemini 3.1 Pro backbone proposes a proof strategy in natural language and pseudocode
The strategy is translated into Lean's formal syntax
Lean's proof checker automatically verifies or rejects the proof — no partial credit, no hallucinated leaps allowed
If rejected, the system receives the exact compiler error message and iterates, refining its approach
The loop repeats until the proof compiles — which means it is formally verified to be mathematically correct

The reason this matters is that Lean is not a fuzzy evaluator. A proof either compiles in Lean or it does not. There is no way to fake a Lean proof — the compiler is the ground truth. This means AlphaProof Nexus's results are not claims subject to human interpretation or peer review; they are machine-verifiable formal proofs that any Lean user can check independently right now.

The Agent Variants: From Parallel to Evolutionary

The paper describes multiple agent configurations with increasing levels of sophistication.

Agent A — the baseline configuration — runs multiple independent sub-agents in parallel. Each sub-agent uses Gemini 3.1 Pro to generate Lean proof code for a target theorem, receives compiler error messages when the proof fails, and iterates. This embarrassingly parallel approach scales well: failed attempts are cheap, so you spawn many agents with different strategies simultaneously and take the first one that succeeds.

The more sophisticated evolutionary framework applies selection pressure across agent populations, routing the most promising intermediate proof states toward further exploration while pruning dead ends. This mirrors the approach used in AlphaProof's original International Mathematical Olympiad work from 2025, which achieved gold-medal performance on competition problems.

For evaluation and match rating — determining which partial proofs are worth pursuing further — the system uses Gemini 3.0 Flash rather than 3.1 Pro. Flash is significantly faster and cheaper, making it appropriate for the high-throughput, lower-stakes work of ranking candidate proof states. This two-tier model architecture (expensive model for reasoning, fast model for evaluation) is a pattern worth internalizing for any developer building production agent systems with tight cost constraints.

The Cost Revolution: $300 vs. 56 Years of Human Effort

The economic angle is as striking as the mathematical achievement. Human mathematical research is extraordinarily expensive when you factor in PhD training, researcher salaries, conference travel, and decades of false starts. A single open Erdős problem might represent the accumulated effort of dozens of researchers over many years with no resolution.

AlphaProof Nexus solved nine of them at inference costs of a few hundred dollars each. The paper does not provide exact per-problem figures, but based on Gemini 3.1 Pro pricing and typical token consumption for complex multi-turn reasoning tasks, the effective cost is likely in the $200–$500 range per solved problem — including all failed attempts across parallel agents.

This is not just a striking datapoint. It represents a fundamental shift in the economics of mathematical research. Problems that would previously require a research grant, a postdoctoral position, and years of effort can now be attempted at startup compute budgets. Universities and research labs that adopt this infrastructure gain a qualitatively different capacity for mathematical exploration — one that is no longer gated by human researcher time.

All formal Lean proofs generated by AlphaProof Nexus are publicly available in the google-deepmind/alphaproof-nexus-results GitHub repository, updated between May 20–22, 2026. Accompanying natural language prose proofs are included alongside each formal Lean proof, making the results accessible to mathematicians who do not yet read Lean fluently.

Hassabis at Google I/O 2026: AGI in the "Foothills of the Singularity"

The AlphaProof Nexus announcement was timed alongside Demis Hassabis's most striking public statements about AI's near-term trajectory. Speaking at the sidelines of Google I/O 2026, Hassabis told the audience:

"I've been saying, recently, around 2030, plus or minus a year, I think is a reasonable estimate, from what I'm seeing now."

He simultaneously described current AI agents as a "practice run" for more general capabilities and said humanity is standing in the "foothills of the singularity." He was careful to note that AlphaProof Nexus itself is "still not AGI" — the system is highly capable in formal mathematical reasoning but lacks the generality that would qualify as artificial general intelligence by any standard definition.

The 2029–2030 window is more aggressive than Hassabis's previous public statements, which had generally placed AGI in the "five to ten years" range. It is notably aligned with similar timelines offered by Sam Altman (2028) and Dario Amodei (2027 or shortly after). The convergence of AGI predictions from the leaders of the three largest frontier AI labs toward the late 2020s is itself a significant signal worth tracking.

For context: Elon Musk had predicted AGI as early as 2026, tracking against a more expansive definition of the term. The mainstream frontier lab definition — systems that can autonomously perform scientific research at or above human level across a broad range of domains — is what Hassabis, Altman, and Amodei are describing when they cite 2027–2030.

What This Means for Developers Building Today

AlphaProof Nexus is a research system, not a generally available product. You cannot call a Nexus API endpoint today. But the technologies it uses are increasingly accessible, and the architectural patterns it demonstrates are directly applicable to production systems.

The Closed-Loop Verification Pattern for AI Agents

The central architectural insight of AlphaProof Nexus — pairing an LLM with a formal verifier that rejects hallucinations rather than scoring them — applies far beyond mathematics. Any domain with a formal correctness checker can adopt this pattern:

Code generation: LLM writes code, compiler or test suite verifies correctness. Already widespread in AI coding tools like Claude Code and Cursor.
SQL generation: LLM generates queries, the database engine validates syntax and executes. Agentic SQL systems already use this approach.
TypeScript strict mode: Type checker as the verifier for LLM-generated TypeScript — the compiler is ground truth, not a human reviewer.
API contract validation: OpenAPI spec validation as the verifier for LLM-generated API calls.
Smart contract auditing: Formal verification tools as the final check layer for AI-generated contract code.

The mathematical proof use case is the most rigorous demonstration of this pattern because formal proofs have zero tolerance for errors. But the principle generalizes: wherever you have a machine-checkable ground truth, you can wire it into your agent loop and eliminate an entire class of hallucination failures.

Two-Tier Model Economics

The Gemini 3.1 Pro + Gemini 3.0 Flash split in AlphaProof Nexus is a concrete, production-tested example of cost-optimized multi-model routing. Use the expensive, high-capability model for the reasoning step that generates novel output. Use the fast, cheap model for the evaluation and ranking steps that happen at high volume on each iteration.

This pattern applies to any production agentic system where you need to balance output quality against inference cost at scale. The ratio in AlphaProof Nexus — where Flash handles the high-frequency evaluation work while Pro handles the low-frequency creative reasoning — is a useful starting heuristic for designing your own agent architectures.

Lean and Formal Verification Are Now Investable Skills

The Lean theorem prover has existed since 2013 and has seen gradual adoption in mathematics departments and specialized compiler and systems programming contexts. AlphaProof Nexus makes a clear argument that Lean is about to become significantly more important in the AI era.

If AI systems use Lean as their ground-truth verification layer — the mechanism that prevents mathematical hallucination — then developers building math-adjacent systems have a concrete reason to understand Lean basics. The Lean 4 documentation and Mathlib (the community-maintained library with over 200,000 formalized mathematical theorems) are the primary starting points. Mathlib contains many of the building blocks that AlphaProof Nexus used as foundations for its Erdős proofs.

Caveats: What AlphaProof Nexus Does Not Prove

The results are significant, but several important caveats are worth noting before drawing broad conclusions from this paper.

First, the benchmarks are vendor-run. DeepMind conducted the evaluation on its own system and published a preprint — not yet a peer-reviewed journal paper. The Lean proofs are machine-verifiable and publicly available for independent checking, which provides stronger evidence than benchmark claims alone. But independent reproduction of the full agentic process, including cost and time figures, has not yet been reported by third parties.

Second, Erdős problems, while genuinely hard, are not the hardest open problems in mathematics. The Riemann Hypothesis, P vs. NP, and the Millennium Prize Problems represent a different order of difficulty. Solving nine Erdős problems with a few hundred dollars of compute does not mean those problems are within reach of current systems.

Third, producing a formal Lean proof that compiles is not the same as generating the kind of conceptual insight that mathematicians consider illuminating. A Lean proof derived via automated search over a large space of lemmas may be correct but unreadable. The accompanying prose proofs in the GitHub repository attempt to address this gap, but the question of whether AI mathematical work produces genuine mathematical understanding — versus mechanical proof search — remains open and philosophically contested.

The AI Math Race: What Is Coming Next

AlphaProof Nexus is one entry in a rapidly accelerating AI mathematics race. OpenAI has its own mathematical reasoning research track. Meta's open-source models have shown strong performance on formal proof tasks. Startups are building Lean-integrated tools specifically for research mathematicians.

The next milestones to watch: whether any AI system cracks a Millennium Prize Problem, whether formal proof AI gets integrated into mainstream mathematical software like Mathematica or Wolfram Alpha, and whether the open-source Lean and Mathlib ecosystem absorbs the AlphaProof Nexus approach into community tooling available to individual researchers.

For developers, the practical timeline is shorter than those milestones suggest. Tools that combine LLM reasoning with formal verification for code, contracts, and data pipelines are arriving in 2026. AlphaProof Nexus is proof of concept at the hardest end of the difficulty spectrum. If the architecture works for Erdős problems, it works for your production SQL generation or TypeScript codegen system too — at dramatically lower cost and with the same hallucination-rejection guarantee.

How to Engage with This Technology Now

If you want to engage with the technology behind AlphaProof Nexus directly, here are the concrete starting points available today:

google-deepmind/alphaproof-nexus-results on GitHub: All formal Lean proofs and accompanying prose proofs from the paper, publicly available and verifiable
arXiv 2605.22763: The full paper — "Advancing Mathematics Research with AI-Driven Formal Proof Search" — with complete architecture and methodology details
Lean 4 official site (leanprover.github.io): Primary documentation and installation guide for the Lean theorem prover used by AlphaProof Nexus
Mathlib4: The community-maintained library of 200,000+ formalized mathematical theorems, which AlphaProof Nexus builds on as its mathematical foundation

For production applications, the key integration point is treating a formal verifier as a zero-tolerance filter in your agent loop. The LLM proposes; the verifier approves or rejects; the loop iterates. This eliminates the hallucination failure mode for any domain where a ground-truth correctness checker exists — and in most engineering domains, one does.

Conclusion

AlphaProof Nexus is the clearest demonstration yet that AI can do genuinely novel, formally verified mathematical work — not just solve textbook problems, but prove conjectures that professional mathematicians have failed to resolve for over half a century, at a cost that makes the economics of mathematical research look fundamentally different.

Demis Hassabis's move of his AGI timeline to 2029–2030 in the same week is not a coincidence. Systems that can autonomously prove Erdős conjectures at $300 each are the same category of capability that feeds into serious AGI predictions. The core components — formal reasoning, verifiable correctness, iterative self-improvement through failure feedback — are precisely the ones researchers believe will scale toward more general artificial intelligence.

Whether or not AGI arrives precisely in 2029, the direction is clear: AI mathematical reasoning is moving from benchmark performance to genuine research contribution. The closed-loop verification pattern AlphaProof Nexus demonstrates, the two-tier model economics it employs, and the Lean formal proof ecosystem it accelerates are all things developers should understand now — not when the next breakthrough lands.

Originally published at wowhow.cloud

DEV Community