DEV Community

Ben Santora
Ben Santora

Posted on

SLMs - The Return of Human Agency Through Constraint


For the last few years, most discussions about local small language models have focused on common themes: the lack of a need for an expensive GPU, lower power draw, privacy, independence from cloud vendors, etc. These are all valid advantages, and they were what initially drew me to explore slms as well. Like many others, I started by asking a simple question: can I get a language model to run on my laptop, without a NVIDIA GPU? The answer turned out to be yes, and that first step was satisfying.

I've been using llms daily since ChatGPT was first released in 2022. So, the first thing I noticed from my new local slm were its limitations. The novelty began to wear off and I drifted back to the online llms.

But as I began trying the many different small language models available - phi-3, qwen-2.5, gemma-3, I began to see how different they were from the large models. You're forced to work with them differently, precisely because their limits are hit so easily. Small language models are not just reduced versions of large ones - they are divergent because of the degree with which the human is required to engage in order to use them productively. You have to be very clear and very specific with them.

Large language models have trained me and millions of other users into a particular relationship with AI. You learn that you can be vague with the model. You can improvise and bounce ideas off it. You can hint at an idea and let the model infer your intent, fill in gaps, and smooth over contradictions. That generosity is powerful, and you get used to it. It’s why systems like ChatGPT feel conversational, advisory, even oracle-like. The model absorbs ambiguity so that the user doesn’t have to fully resolve it beforehand.

Small language models don’t do that. They can’t afford to. They don't have the bandwidth.

When you work with an slm, ambiguity isn’t absorbed — it’s exposed. If you don’t know what you want, the model's limitations will quickly make that obvious. If your instructions are not specific, if they're vague or inconsistent, the output degrades immediately. At first, this feels like a limitation - a negative. In reality, it’s a demand - you realize that it's going to be on you to make the outcome successful. You have to decide what you mean before you ask. A very good habit to develop.

That demand changes everything, because it changes you.

Using the slm effectively requires the user to externalize his or her intent, define boundaries, and specify constraints up front. You have to stop “playing” with the model and start operating it. I saw this begin to happen to me. The work shifted from the relaxed, conversational exploration I'd become accustomed to from thousands of hours using llms, to something different - the smaller model required deliberate articulation from me. In this way, the model becomes less of an advisor and more of an instrument.

The relatively small number of people seriously using slms are undergoing a change that isn't reflected in discussions or slm adoption statistics. These users are being forced into a different cognitive position altogether. This skill predates modern AI, and in many ways predates modern computing. It was gradually eroded by interfaces that grew more forgiving and more opaque. Perhaps this is the real pushback that so many devs feel against the practice of 'vibe coding' - the user is too passive, too lazy and removed from the process at hand. These small language models reintroduce friction.

Ironically, this makes them more trustworthy in certain roles. Their results are every bit as useful as any llm, when given the right instructions. But they don’t 'pretend' to understand what you haven’t defined. They don’t improvise their way around weak specifications. They fail loudly and early, which is exactly what you want from a component in a system.

The real revolution of small language models isn’t that they run locally, that they can ensure privacy or that they’re accessible without specialized hardware. These are all significant. But if you begin to use slms often enough, you discover that they offer something else even more important - they force the human back into the loop as a thinking component. Large models are extraordinary tools for sense-making and articulation. Small models are extraordinary tools for precise execution — but only if the user is willing to do the work first.

Thus, the difference between llms and slms isn’t about scale. It’s about agency. By agency, I mean - where intention resides and who is responsible for defining meaning in the interaction.

With large language models, a significant portion of agency is quietly transferred from the user to the system. You provide a loose prompt, and the model runs with it - it decides how to interpret it, what to emphasize, what to ignore, and often what the goal actually is. That isn’t a flaw, it’s a design choice. It’s what makes llms feel so helpful, conversational, and broadly accessible. But the cost of that helpfulness is that the model is exercising interpretive agency on your behalf.

Over time, users adapt to that. I certainly did. We stop fully specifying our intent because we don’t have to. Maybe we don't know what we want and we want the llm to tell us. And it will - the system will infer, improvise, and reconcile inconsistencies. Agency becomes shared and, in practice, blurred. The model is not just executing instructions; it is co-authoring the problem definition.

Small language models just don’t support that kind of transfer. They leave agency where it started. If the task is unclear, it stays unclear. If the goal is under-specified, it isn’t realized. The model doesn’t step in as an interpretive partner; it waits for direction. The user becomes responsible for defining purpose, scope, and success conditions. Believe me, once you get used to it, you're glad. It's like waking up.

There’s also a longer-term implication. When agency is outsourced, even partially, people begin to trust outputs without fully understanding how the conclusion was reached. This quickly becomes habit, a way of working. But when agency is held by the user, trust has to be earned through clarity and reproducibility. These smaller models encourage the latter because they don’t mask weak thinking with their own fluency.

I still use llms often - large models are willing to steer with you, or even for you and sometimes that's exactly what I want. Small models refuse. I no longer see that refusal as a limitation — it’s a mechanism that forces the human to stay in control.

Ben Santora - January 2026

Top comments (0)