DEV Community

Mehul Goel
Mehul Goel

Posted on

Google Fills the Sora Gap, Recursive Bets $650M on Self-Improving AI, and the 35-Hour Agent That Changes Everything — May 24, 2026

Imagine you hire a contractor to renovate your kitchen. You hand them the blueprints, leave for the weekend, and come back Monday to find the work done — not just the kitchen, but the bathroom too, because they noticed it needed it, and they also rewired a faulty circuit you didn't know about. You didn't ask for any of that. You didn't manage any of it. They just... kept working.

That's roughly what Alibaba's Qwen3.7-Max did last week. The model ran for 35 consecutive hours, optimizing kernel code completely on its own, without a single human check-in. No prompts mid-task. No course corrections. Just an AI that started a job and finished it — and then some. It's a small demo of something that feels quietly enormous: agents that don't need babysitting.

The contractor analogy only goes so far, though. A contractor stops when the job is done and goes home. The more interesting question — the one that keeps pulling at you — is what happens when the agent doesn't stop. What if, while it's running, it's also learning? Building its own database of what worked and what didn't, expanding its own capabilities in real time? That's not a contractor anymore. That's something closer to a colleague who gets better every single day, around the clock, with no sleep and no salary. The infrastructure world is already racing to meet that future. Modal just raised $355 million at a $4.65 billion valuation, and their serverless GPU platform is growing fivefold in under a year. They're not building models — they're building the plumbing that long-running agents will need to stay alive. Think of it like the electrical grid before refrigerators existed. Someone had to build the grid first.

The self-improving angle has a name now, too. Recursive — a stealth startup led by Richard Socher and Tim Rocktäschel, both veterans of OpenAI and Meta — just raised $650 million at a $4.65 billion valuation from investors including Nvidia and AMD. Their pitch is recursive self-improvement: AI systems that make AI systems better. It's a concept that sounds like science fiction until you realize it's basically what humans do with tools. We built better hammers, then used those hammers to build factories, then used those factories to build machines that build hammers we couldn't have imagined by hand. The recursion has always been there. Recursive is just trying to compress the timeline dramatically. The RL angle is a reasonable guess here — reinforcement learning has already proven it can generate superhuman performance when a clear reward signal exists, as AlphaGo and AlphaFold demonstrated. The harder question is: what's the reward signal for "better AI"? And who decides that? The $650 million bet says somebody believes Socher and Rocktäschel have an answer worth that much.

Meanwhile, Cohere dropped a 218-billion-parameter model called Command A+ under an Apache 2.0 license, and the reaction in some corners was a shrug. Which is fair — parameter counts have become a weird flex in an industry that's quietly moved past them. The more interesting stat is that only 25 billion of those parameters are active at any given time, thanks to a mixture-of-experts architecture. Think of it like a hospital with 200 doctors on staff, but only 25 in the building at once — and the right 25 always show up for the right patient. The rest are on call. That's what makes large MoE models actually viable to run. It also explains why nobody really publishes the parameter counts for GPT-4 or Claude anymore. The number stopped being the point. What matters is what it can do per dollar, per second, per query. Command A+ was built on a year of real enterprise deployments, which means it's been stress-tested on the boring, messy, document-heavy workflows that most benchmark leaderboards quietly ignore.

On the video side, Google launched Gemini Omni to fill the space that Sora left when OpenAI discontinued it. The YouTube integration is the detail worth paying attention to — not because the model is necessarily better than what existed, but because distribution is its own kind of moat. A video creation tool inside YouTube is like putting a recording studio inside Spotify. The people who already live on the platform don't have to go anywhere new. They just start creating. That kind of embedded access tends to compound in ways that standalone tools can't easily compete with.

The research out of Google DeepMind is doing something structurally different from all of this, and it's easy to miss if you're moving too fast through the headlines. AlphaProof Nexus solved 9 out of 353 open Erdős problems — math puzzles that have stumped professional mathematicians for decades — and proved 44 sequence conjectures, each for a few hundred dollars of compute. The system pairs a large language model with Lean, a formal proof verifier, which means every answer it gives is checkable. This is the part that gets glossed over: the AI isn't just guessing, it's proving. The proof either compiles or it doesn't. There's no "mostly right." But the 344 problems it didn't solve are just as interesting as the 9 it did. Math is one of the few domains where failure is unambiguous, which means AlphaProof's wrong answers aren't noise — they're data about exactly where the system's reasoning breaks down. That's a research gift.

Nous Research published something quieter but potentially more disruptive to how the field thinks about model behavior. Their Contrastive Neuron Attribution method — CNA — identifies the specific neurons in a model's MLP layers that activate differently on harmful versus benign prompts. By switching off just 0.1% of activations, they cut refusal rates in half across models ranging from 1 billion to 72 billion parameters, without meaningfully degrading output quality. The score stayed above 0.97. The implication is surgical: instead of retraining an entire model to change how it behaves, you can just find the right neurons and turn them down. It's like discovering that a car's aggressive acceleration isn't an engine problem — it's one specific spring in the throttle cable, and you can replace just that.

All of this is happening while the regulatory picture gets messier by the week. Trump's planned executive order — which would have required 90 days of federal access to powerful AI models before public release — was scrapped after pushback from tech allies. The EU, meanwhile, narrowed its list of high-risk AI categories in draft guidance. Both moves, in opposite directions, are pointing at the same problem: no one has figured out how to govern something that moves this fast, touches this many industries, and generates this much revenue. The tech companies have a strong incentive to keep it that way. The argument isn't that regulation is inherently wrong — it's that any regulation specific enough to be meaningful is also specific enough to be lobbied against.

The China piece is conspicuously absent from most Western AI governance conversations, which is strange given that Alibaba just demonstrated a 35-hour autonomous agent and Qwen is one of the most capable open-weight model families in the world. The EU and US can fragment their regulatory approaches all they want, but a global AI governance framework that doesn't include China is like a climate agreement that skips the world's largest emitter. You can write the treaty. It just won't do what you think it will.

What today's digest is really describing, thread by thread, is the shape of a world that hasn't fully arrived yet but is clearly on its way: agents that run without supervision, infrastructure built to keep them alive, self-improving systems funded at billion-dollar valuations, and governance frameworks that are already a step behind. The contractor analogy from the top of this piece starts to feel inadequate. Contractors finish the job and leave. What's being built right now doesn't leave.

The question worth sitting with isn't whether this future comes. It's who decides what "done" looks like — and whether anyone's even watching when it gets there.

Top comments (0)