Unlocking insights from OpenRouter's 2025 State of AI Report
The rapid evolution of Large Language Models (LLMs) has fundamentally changed the landscape of software development. Things are moving so fast, in fact, that it can be hard to spot the deeper trends that will last.
That's exactly where OpenRouter's new State of AI Report comes in. Released in collaboration with a16z (Andreessen Horowitz), the massive report is based on an empirical analysis of over 100 trillion tokens--yes, that's* trillion* with a t--across a broad range of tasks and use cases.
A number of the trends identified in the report fit precisely with what we've seen at Kilo. The analysts highlights:
Coding On the Rise: Programming queries skyrocketed from just 11% of total AI usage in early 2025 to over 50% by the end of the year. Coding is now the fastest-growing category across all models.
Supercharged Tool Use: Programming is a primary driver for models that use "tools" (like executing code or searching docs), and models optimized for reasoning and tool-use now process over 50% of total tokens.
Anthropic Fever: For most of 2025, Anthropic's Claude series (specifically Sonnet) dominated the programming sector, holding a staggering 60%+ market share.
But looking to the future, things are continuing to evolve rapidly.
The report also reveals a significant rise in open-weight (OSS) models. OSS models have grown steadily, growing from minimal to approximately one-third of overall usage in the past year -- mostly, but not only, due to highly competitive OSS models from labs based in China. Our own model leaderboard shows similar trends, with a mix of open and closed, paid and free models actively used for a number of different modes in Kilo (Architect, Code, Debug, etc.).
And on top of that, the State of AI analysts found a fundamental shift toward "agentic inference," where users increasingly rely on models for complex, multi-step reasoning and tool use rather than relatively simple text, number or image generation.
So, how does this change AI in practice? Is it better to build out those complex workflows with tried and true models, or should you always try to be on the cutting edge with the latest technology?
How do product and engineering teams really win with AI?
The 'Glass Slipper' Effect
One of the most fascinating findings in the report is what the team calls the "Cinderella Glass Slipper" effect.
Diving into the data, they found a powerful retention pattern for select new model launches: early users whose engagement persists far longer than later cohorts. When a foundational cohort was formed around a model with a unique offering, they had much higher retention than later cohorts, even once the issues with an initial release were worked out.
These findings run counter to the idea that launches are just a bunch of noise, or that most developers want to avoid initial releases.
Instead the report shows that finding the right "fit" early on makes a big difference:
These cohorts are not merely early adopters; they represent users whose workloads have achieved a deep and persistent workload--model fit. Once established, this fit creates both economic and cognitive inertia that resists substitution, even as newer models emerge....When a newly released model happens to match a previously unmet technical and economic constraint, it achieves the precise fit --- the metaphorical "glass slipper."
The report found that* typical four-month retention is around 10-20%*, but foundational cohorts for strong models that launched at the right time (to the right audience) saw significantly higher retention.
Anthropic's Claude Sonnet 4 launch is a strong example; they retained roughly 40% of users in the initial "launch cohort" by month 4. The Sonnet series has continued to be highly popular; nonetheless, later cohorts didn't have the same staying power.
Of course, not all early models win. But if you can build a foundational cohort around a strong performance, even around one key benchmark that fits a repeatable use case for your power users, then you have a much better chance. Google Gemini 2.0 Flash missed the mark, with the foundational cohort at only 10% retention by month 5. But their next Flash release, Gemini 2.5 Flash, made a splash and was still at over 30% retention by month 5.
Gemini 3 Flash Preview launched on Kilo yesterday and is already in the top 20 for three of our modes. Google themselves is saying that it's "built for speed" and comparing Gemini 3 Flash against their larger models for exceeding certain benchmarks. But are they establishing the right foundational cohort? Only time will tell.
It's not just about being first; it's about finding a perfect match.
Dancing at Kilo Speed
What stands out in OpenRouter's research is the emphasis on solutions that work well for both frontier labs and end users. Early adoption forces you to solve the integration problem first. Developers who master weaving AI assistance into their IDE, pipeline and review processes gain an immense, sustained efficiency lead over those who try to bolt it on later.
The most effective path to AI coding success is not waiting for the perfect model to emerge, but taking advantage of new models and really pushing them to the limit.
Although I agree with the report's analysis that the "next competitive frontier" comes down to "how effectively models can perform sustained reasoning," that type of reasoning isn't always necessary when it comes to coding. Sometimes you just need a model that's fast and efficient, or remarkably good at code review.
At Kilo, we're here to help you find your fit. We already support over 500 models and we're dropping new models every week, including free access to stealth models from the top frontier labs in the world. This means that you can always choose the best model for the task, and switch freely between them without changing your workflow.
We'll see you at the digital ball!



Top comments (0)