Yesterday I wrote about ripping our monolithic agent into five specialized ones. Today OpenAI ships a model that collapses chat, code generation, and web browsing back into a single unified agent. The timing is almost comedic.
But here's what's actually happening — and it's not a contradiction. OpenAI is betting that if the base model is powerful enough, you don't need the orchestration layer. Their dual-tier reasoning architecture switches between fast and slow thinking modes automatically. It's like having a junior dev and a senior architect in the same brain, and the brain decides who shows up based on the problem.
I've been testing the API since yesterday. The 2M token context window is real and usable — not the "technically works but degrades badly after 500K" we've seen before. For our IoT telemetry pipelines, that means fitting days of sensor data into a single call without the chunking gymnastics we've been doing.
The part that matters for builders: pricing held at $2.50/$12 per million tokens. Same as GPT-5.4 but 40% better at coding and reasoning. We're now firmly in the era where model capability improves while cost stays flat. That's not a trend — that's a subsidy war, and builders are the ones who win.
My take: both patterns will coexist. Unified agents for straightforward workflows where one model can hold the full context. Decomposed multi-agent systems for complex domains where specialization still wins. The architecture question isn't monolith vs. microservices anymore — it's knowing which problems are simple enough that a sufficiently powerful monolith just works.
Top comments (0)