DEV Community

Kernel Pryanic
Kernel Pryanic

Posted on

Are we Using AI at the Wrong Scale?

What if I told you that AI can be used more efficiently - and in a way that actually enhances everybody's life and work quality? That the AI bubble didn't have to inflate the way it did? That AI could still evolve more rationally and equally? You might think I just want to catch your attention. Maybe, but not without reason - I want you to see the other side, the future we can and should build.

By now we've grown used to throwing large language models at everything. We open our favorite IDE with some AI cloud integration and delegate coding to a model running somewhere, reading and analysing everything about our codebase and our behaviour. We visit Google Docs and edit documents via prompts to Gemini, let it reply to our emails, generate pictures, parse out data - we even use it to refine the comments we leave, make them more structured and grammatically correct. We're going to shove AI into every single hole that has data for it to be trained on. I'm not going to say we shouldn't - it's the nature of expected technological progress, and that doesn't leave much choice. But hey, we're not doomed, we're not going to be replaced by AI. We're just still in the early adoption phase, when most people don't fully grasp what AI is not and where its limits are, wishful-thinking about it a bit too much. So we actually can shape it - like we shaped radio, then the internet, then open source. We just need to find a more natural path for this technology.

We've probably all heard about every single model release like it's some earth-shattering event, destined to shadow everything before it and boost everything tenfold. Then when we actually start using the new model, we find a modest improvement - and most of those improvements are rather specific, basically derivatives of what the model was trained on. Take the mysterious announcement of Anthropic's Mythos, supposedly "too dangerous to release" - we don't even know yet if it justifies the hype. Meanwhile this experimental article from Aisle already suggests small models can match or outperform it in vulnerability scans - one early experiment, but telling. Actually, that's not unusual at all. Chinchilla challenged the "bigger is always better" orthodoxy back in 2022, and since then the evidence has only stacked up - small models trained on high-quality data for a dedicated task can match or beat their much larger cousins.

And now, dear reader, we're approaching the actual point of this article: our use of AI may not be entirely correct. We rely on large models for tasks that could be delegated to much smaller ones, often yielding better results. Corporations are pushing the cloud paradigm hard, but if you look closer, you can see a different path - one that doesn't look like Cyberpunk 2037, one that doesn't require massive H200 clusters just to prettify your CV, one that leads to more equal AI distribution and doesn't try to substitute anybody.

That path consists of small, dedicated models trained to do one or a few specific things at most. Models that are just smart enough to fulfill their purpose, and small enough to avoid creating the false impression that they're replacing anyone. This is the mass AI of the future - a true symbiosis. Or to be more precise, it's proper tool use, because AI is not a being. It's a simulation of one: a very cleverly engineered statistical model, good at approximation - so good it looks like adaptability. And when software fully matures around this idea - not bolting MCPs onto existing tools, but building AI-native from the ground up - that's when the path really opens up. And this path is still the one we can take, the natural path for AI to prosper and us to explore and enhance, not replace.


P.S. I'm not saying large models are a dead end - they can and should be used where that power is genuinely required, like complex coding or in-depth analysis. But the numbers are hard to ignore. Qwen3-Coder-Next is 80B total parameters but only 3B active per token thanks to MoE - and it performs on par with models that have 10–20x more active compute, running on a single consumer GPU. Go smaller still and it gets more interesting: a Qwen3-4B fine-tuned for a specific task matches a 120B+ model on that task, deployable on consumer hardware. Or take Chandra - a small OCR model purpose-built for PDF and image conversion that outperforms Gemini 2.5 Flash on multilingual document benchmarks. Not because it's smarter. Because it's focused. Using large models for everything is the dead end - a massive waste of resources and fuel for the centralization of power.

Top comments (0)