Google launched Gemini for Science yesterday at I/O 2026, and it is the most interesting thing to come out of the keynote. Not the smart glasses, not the new subscription tier, not Gemini Spark promising to coordinate your Gmail and close your laptop. The science suite. Because it is the clearest statement yet of what Google actually thinks AI agents are for.
The pitch is direct: science is drowning in its own output. Millions of papers per year, petabytes of biological data, a growing gap between what the literature contains and what any individual researcher can hold in their head. Google's framing is that general agents, not narrow specialist models, are the right tool to close that gap. Gemini for Science is the first concrete product that argument.
There are three experimental tools. Hypothesis Generation is built on top of something Google calls Co-Scientist. You feed it a research challenge, and it runs what the announcement describes as a "multi-agent idea tournament": multiple sub-agents generate hypotheses, debate them, and evaluate them against each other. The winning ideas come with clickable citations and, Google claims, deep verification. Computational Discovery is an agentic engine built on AlphaEvolve that generates and scores thousands of code variations in parallel, designed to dramatically expand the number of hypotheses a lab can actually test. Then there is Science Skills, a bundle that integrates data from more than 30 major life science databases including UniProt, AlphaFold, AlphaGenome, and InterPro. Google says teams using it have compressed complex structural bioinformatics analyses from hours to minutes, and that in early testing it produced novel insights about mechanisms behind a rare genetic disease involving the AK2 gene.
These are still prototypes on Google Labs. I want to be clear about that. No peer-reviewed paper yet confirms the hypothesis quality. The "idea tournament" framing is evocative but we don't know how it compares to a grad student with a lit review and a week of focused reading. The AK2 claim is intriguing and completely unverified by anyone outside Google.
But here is the thing that keeps pulling at me. The structure of the scientific method, generate a hypothesis, design a test, interpret results, iterate, is actually a good fit for agentic systems in a way that most knowledge-work tasks are not. Code review, document drafting, customer support: these are all things where a human can spot a bad output quickly. Bad science, on the other hand, can look convincing for years. The speed advantage is real; the verification problem is equally real, and Gemini for Science doesn't fully solve it.
Demis Hassabis closed his section of the keynote by telling the audience that looking back on this moment, we'd realize we were "standing in the foothills of the singularity." The Engadget live blog response was "Lol, Lmao even." That reaction is fair. It is also slightly too easy.
I think Hassabis is wrong about the timeline and the framing, but he is pointing at something true. The part of science that is bottlenecked by synthesis, by the human inability to hold the full literature in working memory, is exactly the part that language models are good at. If the hypothesis tournament produces even one genuinely novel connection per hundred tries that a human researcher would not have found, the economics change. Not because AI is smarter than a scientist, but because it is faster, tireless, and can hold more context simultaneously.
The "foothills of the singularity" language is designed to generate attention. The quiet part of the announcement, the 30-database integration inside Google's Antigravity platform, is where the actual science will get tested or not. That rollout is worth watching more carefully than the keynote rhetoric.
Top comments (0)