joacod

Posted on Jun 30 • Originally published at joacod.com

Small AI and the Return of Better Software

#ai #product #programming #software

A new type of product is starting to appear.

It uses AI, but not in the way we have become used to. It does not depend on a giant model in the cloud, an API key, a token budget, or a vendor that can change pricing, access, latency, or policy whenever it wants.

Instead, these products use small, local models trained or optimized for very specific tasks. They run on your laptop, your phone, your browser, or inside a company's own infrastructure. They are not trying to be everything. They are trying to do one thing well, close to the user, with less friction and more control.

I think this is one of the most interesting product trends happening in AI right now.

For the last few years, the industry has been obsessed with scale. Bigger models, bigger context windows, bigger data centers, bigger demos, bigger promises. Most of the conversation has been about the frontier: which model is smarter, which benchmark was beaten, and which company is ahead.

That part matters. I am not pretending it does not. Frontier models are impressive, and for many complex tasks they are still the best option. But I think the obsession with scale has made us overlook another direction that may be just as important: smaller models embedded into better software.

Most people do not need the biggest possible model for every interaction. They need software that understands the workflow and handles routine work without turning every tiny action into a cloud request.

That is where small AI becomes interesting. Not because it is more powerful than frontier AI in general, but because it can be more appropriate.

And in product design, appropriate often beats powerful.

The Mistake Is Thinking Every AI Product Should Be a Chatbot

A lot of AI software today feels like the same product wearing different clothes. There is a text box, a sparkle icon, a sidebar, a subscription, and a promise that the assistant will somehow make everything better.

Sometimes it does. Often it feels like AI was added because the market expected it, not because the product actually needed it.

The more interesting version of AI is not a chatbot sitting next to the workflow. It is intelligence inside the workflow: the product understanding what the user is already trying to do and removing a piece of friction at the exact moment it appears.

That does not require a giant general-purpose model. In fact, a giant model can be the wrong abstraction. If the task is narrow, repeated, private, and structured, a smaller local model can be faster, cheaper, safer, and easier to trust.

This is the part I think people underestimate. The future of AI products will not only be about making models smarter. It will also be about deciding where intelligence should live: in the cloud, in a company's private infrastructure, or on the user's device, quietly doing useful work without turning the whole product into a platform.

Local AI Is Not a Toy Anymore

A couple of years ago, local models were easy to dismiss. They were fun for demos and open-source experiments, but the experience was usually worse than the cloud. The models were smaller, slower, less capable, and harder to run. You could see the potential, but you also had to tolerate a lot of rough edges.

That is changing. Not completely, and not in some magical overnight way, but enough that product people should pay attention.

Google's Gemini Nano documentation describes on-device AI as useful for cases where privacy, low cost, and offline operation matter. Microsoft has also been moving more AI capability into the browser and onto the device, including Edge's Prompt API and Writing Assistance APIs, where privacy and latency are part of the pitch.

That is the signal. Local AI is no longer just a hobbyist story. It is becoming a platform direction.

The important thing is not that every local model suddenly beats the best cloud model. That is not true, and it does not need to be true. The important thing is that many workflows only need a good-enough model wrapped in the right product, especially when that model is private, fast, cheap, offline, and under the user's control.

The Cloud Dependency Problem Is Becoming Obvious

The current AI stack has a weakness that a lot of companies do not want to talk about: many products are thin layers over someone else's model.

That was understandable during the first wave. It allowed people to build quickly, validate ideas, and ship things that would have been impossible before. There is nothing wrong with starting there.

But over time, that dependency becomes part of the product's risk.

If your product depends entirely on a cloud model, your product also depends on that vendor's pricing, rate limits, latency, outages, model updates, policy changes, access rules, and data boundaries. You are not just buying capability. You are accepting someone else's constraints as part of your product.

This is not theoretical anymore. Access to the most capable models can become conditional, shaped by customer tier, regional availability, safety reviews, enterprise contracts, usage policy, or regulatory pressure. The product lesson is simple: frontier capability is not just a technical dependency. It can also become a policy dependency.

That changes how builders should think.

If the most important part of your product can be limited, delayed, repriced, or reshaped by another company or a government process, then your product is not fully yours. For workflows involving sensitive information, internal company data, or professional trust, that is a real weakness.

Small local models do not solve every version of this problem. But they make it possible to move some intelligence back into the product itself instead of renting every decision from the cloud.

Cost Is Going to Force the Issue

Privacy is the argument people like to make in public. Cost is the argument that will change architectures inside companies.

At the beginning of the AI wave, the goal was just to make something work. Use the strongest model, send the full context, chain prompts together, add retries, add agents, and worry about the bill later.

That made sense while everyone was still exploring. But the bill is becoming harder to ignore, and companies are starting to realize that not every task deserves the expensive model.

The first version of many products sent too much work to the cloud because that was the fastest path. The next version will move more of the obvious work closer to the user: redacting before sending, preparing cleaner context, caching aggressively, and handling narrow tasks locally when the product does not need a remote model call.

That is not as exciting as saying "agents will do everything," but it is probably closer to what durable AI products will look like.

A lot of useful work is not frontier reasoning. It is pattern recognition with constraints: turning messy input into structured output, finding sensitive information before it leaks, or recognizing that a user is repeating the same tiny workflow for the hundredth time.

Those are not weak use cases.

Those are most of software.

Open Models Are Getting Closer to the Frontier

The other reason this trend feels bigger than it looks is that the model ecosystem is improving at multiple levels at the same time.

At the high end, open models are becoming strong enough that companies have to take them seriously. GLM-5.2 is a good example of the direction of travel: stronger coding, long-horizon task support, a 1M-token context window, and a permissive license.

The interesting part is not that GLM-5.2 is small. It is not. The interesting part is that open models are moving closer to the state of the art usually associated with proprietary frontier labs like OpenAI, Anthropic, and Google. Z.ai positions GLM-5.2 near Anthropic's flagship Opus 4.8 on several long-horizon coding benchmarks, and even if you treat vendor benchmarks with the usual caution, the direction is hard to ignore.

That changes the shape of the market. From the top, open models reduce dependence on proprietary frontier systems. From the bottom, specialized local models reduce dependence on cloud calls for narrow workflows.

This is why I do not think the future is simply "closed cloud models versus local models". That framing is too simplistic. The more useful question is which parts of the workflow should stay close to the user, and which parts are worth sending somewhere else.

Small AI Is Really Anti-Bloat Software

The reason I like this trend is not only technical. It is aesthetic.

A lot of software has become too big for what it does. SaaS rewarded expansion, so every product wanted to become a platform. Every focused tool slowly turned into a workspace. Every workspace became a suite. Every suite added AI. And now every AI feature wants to become an assistant.

The result is expensive software with too many features, too many settings, too much lock-in, and too little taste.

Small local AI pushes in the opposite direction. The model does not need to be the product. It can be a small part of the product, placed exactly where it helps: making a narrow workflow faster without turning the whole experience into a conversation, and improving the software without asking the user to trust another cloud service with another category of private data.

It starts with the workflow, not the model: what the user is trying to do, where the friction repeats, what privacy boundary matters, and what the smallest capable system looks like. AI becomes leverage, not decoration.

That is the difference between "we added AI" and "the product got better".

The Real Shift Is Control

The biggest impact of small local AI may not be performance. It may be control: where the model runs, what data leaves the device, how much the system costs, and whether a vendor, policy change, outage, or pricing update can break the workflow.

We have been too casual about giving away context. Social networks trained people to upload their lives. SaaS trained companies to move their operations into someone else's database. Analytics tools trained product teams to capture everything. Now AI is asking for even more: our documents, meetings, source code, customer conversations, financial data, medical notes, private thoughts, internal strategy, and unfinished work.

Maybe the default should be to process locally when possible, escalate only when necessary, and keep the user in control of the boundary.

That is the part of this trend that feels bigger than productivity. It is about deciding what kind of relationship we want with intelligent software.

Do we want intelligence to be something we rent from a few centralized systems, or something that can also live inside our own devices, our own tools, and our own organizations?

The answer is probably both.

But right now the industry is over-indexed on one side.

This Is a Return to Product Taste

A small model forces you to understand the job. You cannot hide behind general intelligence. You have to define the workflow, the input, the output, the failure mode, the privacy boundary, and the user interaction. You also have to decide what should be automatic, what should require confirmation, and how the product remains useful when the model is not magical.

That constraint is good.

It brings taste back into the process. It rewards builders who understand the user's workflow instead of just connecting a text box to an API, and products that feel simple because the complexity is handled in the right place. Above all, it rewards restraint.

That is what I find exciting: not AI everywhere, chatbots everywhere, or giant systems pretending to replace the whole workflow. Just smaller intelligence, placed carefully, doing useful work close to the user.

The future of AI will still have massive models in massive data centers. That world is not going away. But next to it, there will be another world: smaller models inside better software, running locally, privately, cheaply, and with more control. It will not always look impressive in a demo, produce a viral benchmark, or claim to replace entire professions. It will just remove a piece of friction that used to be annoying, expensive, risky, or slow.

That is usually how good software wins.

Not by being bigger.

By being closer to the problem.

DEV Community