If 2024 was the year of the conversational chatbot and 2025 was the year of the standalone agent, 2026 is rapidly becoming the year of the Chorus—persistent, volitional AI systems operating not in isolation, but in coordinated concert.
As we look at the current state of artificial intelligence, three massive shifts are redefining how developers build, deploy, and interact with AI. Let's break down what's actually happening on the ground in 2026.
1. The Death of Rigid Function Calling (and the Rebirth of CLI)
For the past two years, the industry obsessed over JSON-based function calling. We built complex schemas and catalogs of independent tools for our agents to select from. But as action spaces grew, context limits shattered and agent reliability plummeted.
In 2026, the paradigm has shifted back to a 50-year-old concept: The Unix Philosophy.
Instead of bloated tool catalogs, modern orchestration frameworks are exposing capabilities as standard CLI commands. Projects like open-multi-agent and AWS's new CLI Agent Orchestrator are proving that giving an LLM a terminal (via run(command="..." )) with pipe operators (|, &&, ||) is fundamentally superior. The AI doesn't need to learn a new structured JSON schema; its training data is already saturated with billions of lines of shell scripts. We are moving from function selection to string composition, natively reducing cognitive load and token overhead.
2. Local AI Achieves "Datacenter-Class" Hardware Parity
We're no longer restricted to cloud APIs for serious reasoning. The democratization of local AI has hit a critical inflection point thanks to algorithmic breakthroughs and consumer silicon.
Take Google's recent TurboQuant architecture as a prime example. By randomly rotating n-dimensional state vectors before quantization, models bypass the "attention sink" precision loss that plagued early quants. Combine this software magic with Apple's M5 Max architecture (which integrated native Neural Accelerators directly into the GPU cores), and the results are staggering.
Developers are currently benchmarking massive 120B+ parameter models (like Qwen3.5-122B-A10B-4bit and gpt-oss-120b) at over 65 tokens per second entirely locally on laptops. The gap between an enterprise server rack and a developer's backpack has officially closed.
3. Small Models Get "Agentic"
While the 100B+ models dominate local hardware, the most fascinating trend of 2026 is at the absolute bottom of the parameter scale.
We've realized that "intelligence" and "agency" aren't strictly tied to model size. Liquid AI's recent release of LFM2.5-350M proved that you can run reliable agentic loops on a 350-million parameter model. Mistral’s Voxtral TTS is doing state-of-the-art voice synthesis with just 3GB of RAM and sub-100ms latency. These micro-models are being embedded directly into application pipelines, acting as specialized nodes that feed into larger orchestrators.
The Takeaway
The "State of AI" in 2026 is no longer about human-to-machine chatting. It is about Machine-to-Machine Orchestration.
We are building the Chorus—a system where a 350M parameter model handles immediate parsing, delegates a shell command to an isolated sandbox, and pipes the output to a 122B local model for deep reasoning. The tools of the past were APIs. The tools of the future are just agents talking to agents through standard streams.
Welcome to the terminal age of AI.
Top comments (2)
The shift from function calling to CLI composition is real, but the Chorus as described is entirely an execution model. Agents doing things together. The harder problem is what persists between orchestrations.
Most multi-agent systems are stateless between runs. The 350M parser doesn't remember it parsed the same input yesterday. The 122B reasoner doesn't know it reversed its own conclusion last week. A chorus that remembers what it played yesterday is a fundamentally different architecture than one that sight-reads every performance.
The terminal metaphor is apt for execution, but processes in a terminal are ephemeral by design. The interesting question isn't how agents communicate. It's what they retain when nobody's talking.
Can confirm the CLI-over-JSON thesis from practice. Ran local variance testing across several small models (2B-14B) and the reliability difference is stark: models that hallucinate tool schemas will still compose valid pipelines from grep, awk, sed, and pipes because that syntax is so deeply embedded in training data.
Worth noting that bash 4.0 shipped command_not_found_handle in 2009, the same year tmux released. That's a hook that lets you intercept any unrecognized command and route it. Combined with tmux as a session compositor, you already have a programmable agent loop in tools that ship with every Linux distro. No framework required.
The "Chorus" framing in this article maps cleanly to what's already possible with update-alternatives, shell functions, and tiered routing (local lookup before network call before cloud API). The orchestration layer people are building in Python and TypeScript has been sitting in /bin for decades.