The AI landscape is witnessing fierce competition among frontier models, with new metrics like Epoch AI's combined ECI score highlighting exponential gains in reliable long-task performance. Gemini 3 Pro leads at ~4.9 hours, outpacing GPT-5.2 (~3.5 hours) and Claude Opus 4.5 (~2.6 hours), underscoring how development continues unabated. This aligns with real-world feats, as users report ChatGPT-5.2-Pro sustaining 85 minutes on complex problems, pushing the envelope for agentic capabilities despite persistent challenges.
Gemini continues to flex on reasoning fronts, dominating GPT-5.2 in fresh benchmarks and signaling shifting leadership from Google. Meanwhile, in a stunning upset, video app giant Zoom unveiled a state-of-the-art model hitting 48% on Humanity's Last Exam (HLE), a benchmark not on anyone's radar for the conferencing leader—proof that AI breakthroughs are infiltrating even non-core players.
Google's Gemini Flash saw major native audio upgrades, delivering noticeably sharper real-time translations—like instantly dubbing English YouTube videos into German with unprecedented quality—priming it for seamless global use despite user gripes over auto-activation. Complementing this, a blog post's text-to-speech demo stunned listeners with its eerily human intonation, marking another leap in TTS realism that could redefine accessibility and content creation.
Perplexity AI dropped the first large-scale study of in-the-wild AI agent usage, analyzing hundreds of millions of anonymized sessions from its Comet browser and Comet Assistant. Key findings: 57% of queries cluster around Productivity/Workflow and Learning/Research, with "stickiness" in high-frequency tasks but adoption skewed toward early adopters in wealthier nations and knowledge jobs—flagging a potential "agent divide."
Echoing efficiency trends, Perplexity CEO Aravind Srinivas announced that Comet Assistant compute will migrate to fast, lightweight models, potentially runnable locally, optimizing for speed and edge deployment amid rising agent demands.
The model race shows no signs of plateauing, with next week packed: OpenAI dropping two image models and a polished, more powerful GPT-5.2 post-Christmas; Gemini unleashing Gemini 3 Flash—rumored superior to Gemini 2.5 Pro with massive market ripple effects; xAI's Grok 420 Blaze; and Anthropic's Claude Opus XXL. "There is no wall, only exponentials," as one observer put it.
"next week isn’t slowing down. openai releasing two image models. gemini releasing flash. grok releasing grok 420 blaze. anthropic working on claude opus XXL after christmas openai are releasing a much more powerful and polished version of 5.2. there is no wall, only exponentials."
— 🍄🍄🍄 (@iruletheworldmo)
DeepMind co-founder Shane Legg, when pressed on whether LLMs show sparks of AGI, replied unequivocally:
"it's a lot more than sparks"
— Shane Legg
Hold on to your hats, indeed—this from a key AGI architect amplifies the narrative of rapid, transformative progress across benchmarks, agents, and apps, setting the stage for 2026's upheavals.



Top comments (0)