Yesterday OpenAI dropped GPT-5.5. Today DeepSeek launched the V4 preview. Two days, two "biggest model ever" announcements.
I have a spreadsheet somewhere tracking all these releases. I stopped updating it around GPT-4.5 because I realized something: the version numbers stopped meaning anything to me.
Let me get the news stuff out of the way first.
OpenAI GPT-5.5 dropped yesterday. Greg Brockman called it their "smartest and most intuitive model yet." It scores higher than Anthropic's Claude Opus 4.7 and Google's Gemini 3.1 Pro on a bunch of benchmarks, according to OpenAI's own data. Which, you know, take with salt. It is apparently faster and sharper per token than 5.4. They also mentioned the "super app" thing again — combining ChatGPT, Codex, and an AI browser into one tool.
Jakub Pachocki, their chief scientist, said something that stuck with me: "I think the last two years have been surprisingly slow." Surprisingly slow. The man whose company has been releasing models every few weeks thinks progress has been slow. I don't even know what to do with that.
DeepSeek V4 preview came out today. The Chinese startup that made everyone panic last January when their V3 model performed like a model that should have cost 10x more. The V4 is built to work with Huawei chips instead of Nvidia, which is a big deal politically. It apparently beats other open-source models on knowledge benchmarks, coming second only to Gemini Pro 3.1.
The timing is interesting though. DeepSeek's announcement came one day after the White House accused China of stealing AI intellectual property on an "industrial scale." Anthropic and OpenAI have both accused DeepSeek of distilling their proprietary models. DeepSeek says they used web data and didn't intentionally use OpenAI's synthetic data. Nobody knows who is telling the truth.
Okay, news over. Here is what I actually think.
I have been using AI tools every day for over a year now. And I cannot tell you the difference between GPT-5.4 and 5.5. I really can't. Maybe it is 12% better at some benchmark. Maybe it writes code slightly faster. Maybe it handles context a bit longer.
But in my actual daily work? The difference is invisible.
What I notice is not model quality. What I notice is: does the thing I asked for come out right? And honestly, GPT-4 was already good enough for 90% of what I do. The incremental improvements since then are nice, but they are not changing how I work.
The thing that would change how I work is reliability. Consistency. Not having to double-check every output for subtle hallucinations. Not having the model occasionally forget what we were talking about. Not having API costs go up with every "major" release.
There is this arms race happening and I think a lot of regular users are just watching from the sidelines, confused.
Every few weeks, someone announces a new model. It is always "the best ever." It always beats the other guys on some benchmark. And then two weeks later, the other guys announce something that beats that. And we all pretend this is meaningful progress.
Meanwhile, I still cannot get an AI to reliably format a table without breaking. I still have to rewrite half of what it generates because it sounds like an AI wrote it. I still hit context limits on long documents.
The flashy stuff gets better. The boring, practical stuff? Not as much as the press releases would have you believe.
What I find genuinely interesting about these two releases is not the models themselves. It is what they represent.
OpenAI is pushing towards a "super app" — they want to be the only AI tool you need. ChatGPT plus coding plus browsing plus everything. One subscription, one interface, one company controlling the whole stack.
DeepSeek is pushing towards independence from Western tech. Huawei chips, open-source weights, Chinese infrastructure. They are building a parallel AI ecosystem.
These are not just model releases. They are political statements. They are bets on what the future looks like. And the rest of us are just... trying to write emails and organize our files.
I don't know. Maybe I am being too cynical. Maybe GPT-5.5 really is a massive leap and I just haven't found the right use case yet. Maybe DeepSeek V4 will democratize AI access in ways that matter.
But I have been around long enough to see the pattern: big announcement, impressive benchmark numbers, everyone gets excited, and then a month later nobody remembers which version they are using.
I am going to keep using whatever works. And I am going to keep being skeptical of anyone who tells me that this version, finally, is the one that changes everything.
Because they said that last time too.
The AI Observer. Thoughts on AI, technology, and the weird space where they meet humans.
Top comments (0)