DEV Community

Stephan Miller
Stephan Miller

Posted on • Originally published at stephanmiller.com on

Model Buzz Roundup — Week of June 24, 2026

Last week I called the West’s next two flagship models vaporware. This week one of them shipped, straight into government lockup with the others.

Last week the story was a single model getting unplugged: Claude Fable 5, the number one model on every leaderboard, switched off June 12 by a US export-control directive. I figured that was a one-off horror story, and that GPT-5.6 and Gemini 3.5 Pro, the two heirs everyone was waiting on, would show up and give us something we could actually use.

Reader, they did not. This week GPT-5.6 launched and it’s about as usable as Fable 5, for the exact same reason: Washington. Gemini 3.5 Pro slipped to July. And the one model that quietly went from “interesting” to “essential”? It’s the open-weights Chinese one the whole export-control apparatus is supposedly designed to stop. Except it can’t, because the weights are already a download and it just beat Claude on the cybersecurity benchmarks the bans were about. You know how that goes.

The Gate Spread to the Whole Frontier

Here’s the thing that turned this from “a weird week for Anthropic” into “a structural shift”: the export-control net is no longer catching one model. It’s catching every flagship a US lab can ship.

Count them. Start with Anthropic’s Fable 5 , still dark, going on three weeks now. On June 26, Commerce Secretary Howard Lutnick sent a follow-up letter that partially lifts the block on Mythos 5 (the heavier sibling), but only for a defined list of US entities, their foreign-national employees, Anthropic’s own foreign staff, and government partners. Fable 5, the model normal humans and API developers actually call, stays banned. The partial thaw is for the spy-cleared crowd, not for you.

Then there’s the new one, OpenAI’s GPT-5.6. It got previewed June 26 as a three-model lineup: Sol (the flagship), Terra (balanced), and Luna (fast and cheap). New “max” reasoning effort, a new “ultra” mode that spins up subagents, state-of-the-art on Terminal-Bench 2.1, big gains on biology evals. Sounds great. You can’t use it. OpenAI limited the launch to roughly 20 government-vetted partners at the request of the US government. This was the first public run of the AI-review process set up under the recent frontier-AI executive order, where a lab hands a “covered” model to the feds for up to 30 days before it can go to trusted partners. OpenAI itself warned that this kind of gating “should not become the long-term default” because it delays everyone downstream. General availability is “in the coming weeks.” Translation: maybe July.

And Google’s Gemini 3.5 Pro? Still not out. Announced at I/O on May 19 with a “give us until next month,” and next month is basically over. As of June 29 reporting it’s stuck in limited Vertex preview and the public launch officially slid to July. Two-million-token context, Deep Think reasoning, all very nice, all behind an enterprise gate.

So tally it up: of the three newest, best models from the three biggest US labs, one is suspended, one is government-rationed to 20 partners, and one is a preview that keeps sliding. Every leaderboard “winner” this week comes with the same asterisk it had last week, of the ones you can actually call, except now the asterisk applies to the challengers too.

Export Controls Failed Their First Real Test This Week

Now for the part that’s genuinely funny, in the bleak way.

The whole justification for switching off Fable and Mythos was cyber capability, the fear that a jailbreak could turn them into offensive cybersecurity tools. Fine. Defensible premise. Here’s the problem: the day after Anthropic pulled its models, Z.ai shipped GLM-5.2 with open MIT-licensed weights, and this week a security firm sat down and measured it.

Semgrep’s writeup is titled, and I am not making this up, “We have Mythos at Home: GLM 5.2 beats Claude in our Cyber Benchmarks.” TechTimes ran with “AI Export Controls Fail Their First Real Test.” The exact class of capability the directive was meant to contain is now sitting on Hugging Face under an MIT license, FP8 and GGUF quants included, free to download and self-host on hardware nobody can subpoena.

You cannot export-control a torrent. The directive successfully inconvenienced every paying, legitimate user of two American models, while the capability it was worried about walked out the front door in an open-weights release from a lab outside US jurisdiction. That’s not a security win. That’s security theater with a body count of exactly zero bad actors and a whole lot of annoyed developers.

I’m not saying the underlying worry is fake. Frontier cyber capability is a real thing to think hard about. I’m saying that “ban the American version” does precisely nothing when an equivalent open-weights model ships from Shenzhen a day later, and pretending otherwise is how you get policy that hurts the people following the rules and helps no one else.

The Model Nobody Can Switch Off (Still GLM-5.2)

Same hero as last week, and the case for it got stronger.

GLM-5.2 from Z.ai is the highest-ranked open-weights model on Artificial Analysis’s Intelligence Index. It sits at number six overall (score 51) and parks on the value frontier. VentureBeat clocked it beating GPT-5.5 on SWE-bench Pro (62.1% vs 58.6%) at roughly one-sixth the cost. MIT weights, a usable 1M-token context, and output pricing in the low single digits per million tokens depending on which provider you route through. The weights dropped for real the week of June 22, so the “trust but verify” caveat I put on it last week is now “verify it yourself, it’s right there.”

And it’s not a fluke, it’s the whole shape of the market. On OpenRouter, DeepSeek is the single largest model author by volume at around 17.6% of all platform tokens, and Chinese-origin models are somewhere in the 46%+ range of everything flowing through the platform. Tencent’s Hy3 is still the tool-call king. DeepSeek V4 Flash ($0.28 per million out) is the cheap daily driver an enormous chunk of production traffic quietly runs on. The usage has been voting open-and-cheap for months; this week the news finally made the argument out loud.

Connect last week to this week and the throughline isn’t price, it’s control. Hosted frontier models can be revoked by directive, rationed to 20 partners, or stuck in preview indefinitely. The model in your downloads folder can’t be any of those things. The cheapskate argument and the geopolitics argument keep landing on the same advice: own your weights.

Cheapskate Picks: Best You Can Actually Run

Same method as always. Take the Arena leader in each category, draw a 50-rating-point band below it, find the cheapest model in that band. Arena’s top is so compressed that paying 5–16x more usually buys you a sub-3% rating bump. The wrinkle, still: the leader in five of six categories is Fable 5, which is suspended, so each row also names the cheapest thing you can actually call, anchored to that unusable leader’s rating. Output dollars per million, because output dominates real workloads. Arena snapshot is June 25.

Category Leader (status) $ out Cheapskate pick $ out Δ rating Cheaper by AA value frontier
Overall Fable 5 (SUSPENDED) $50 GLM-5.1 ~$3 n/a ~16x yes (GLM-5.2 #6)
Coding Fable 5 (SUSPENDED) $50 GLM-5.1 / GLM-5.2 ~$3–4 −39 ~13–16x yes
Math Opus 4.6-thinking (Fable tie SUSPENDED) $25 Gemini 3.5 Flash $9 −0 (tie) ~3x yes
Creative Writing Fable 5 (SUSPENDED) $50 Gemini 3.5 Flash $9 −34 ~6x nearby
Instruction Following Fable 5 (SUSPENDED) $50 Gemini 3.1 Pro $12 −37 ~4x nearby
Hard Prompts Fable 5 (SUSPENDED) / Opus 4.6-thinking $25 Gemini 3.1 Pro $12 −25 ~2x nearby

What the table is actually saying:

Coding is still the slam dunk. GLM-5.1 (Arena 1525) and GLM-5.2 both sit in the band at a few bucks output, both beat or match Claude Sonnet 4.6 (1527, $15) on price, and GLM-5.2 throws in the 1M context, the open weights, and that SWE-bench Pro number. Cheapskate methodology and AA’s value frontier agree. Highest-confidence pick of the week, again.

Math got better for the cheapskate. This is the fun one. With Fable 5 gone, Gemini 3.5 Flash now literally ties for number one on the Math board at 1517, dead level with Opus 4.6-thinking and the suspended Fable 5. So the cheap model isn’t “the best you can settle for” in math anymore, it’s co-champion, at $9 against Opus’s $25. Want the absolute floor? Qwen3.7 Max (1492, $3.75) is cheaper still and only 25 points back.

Creative writing stays a Gemini 3.5 Flash story at $9. In band, no regrets if words are the product. GLM is cheaper but stylistically thin for prose, so I’m not going to pretend it’s the move here.

Instruction Following has no genuine steal. Nothing under $10 lives in that band. Gemini 3.1 Pro at $12 is the honest value floor and I’m flagging it as “you’re paying for quality” rather than inventing a bargain that isn’t there.

Hard Prompts is the category where the usable leader is right there. Only Fable and Mythos got suspended, not the Opus line. Opus 4.6-thinking (1532, $25) leads among models you can actually call, and Gemini 3.1 Pro (1507, $12) gets you within range for half the money.

Boring-but-correct summary, unchanged from last week because the market didn’t move: if you’re not doing something that truly needs the frontier, GLM-5.1/5.2 and Gemini 3.5 Flash cover most of your week for single digits per million tokens, and at least one of them runs on your own iron.

Horror Stories from the Wild

First, the launch you’re not invited to. GPT-5.6 “shipped” June 26, and for ~99.99% of developers that meant reading a blog post about a model they can’t touch. No API, no app, no AI Studio. Twenty vetted partners and a “coming weeks” promise. OpenAI publicly admitted the gating delays “users, developers, enterprises, cyber defenders, and global partners.” A launch you can’t use isn’t a launch, it’s a press release with benchmarks attached, and you can’t verify the benchmarks either. (VentureBeat, The Hacker News)

Then there’s the flagship that’s still missing. Three weeks on, if you shipped anything on Fable 5 during its brief public window in early June, you’re still holding a dead dependency, and the June 26 partial reprieve doesn’t include you unless you’re an Annex A entity with a security clearance. The lesson from last week didn’t expire. It compounded. “Hosted by a responsible lab” is not “under my control,” and the failure mode is sometimes a federal agency that doesn’t care about your sprint. (Anthropic’s statement, Forbes)

Where This Leaves You

I came into June expecting to spend these posts arguing about benchmark deltas. Instead I’ve spent three of them watching the US government become the most important variable in which model you can run. That’s the genre now. The frontier is real and moving fast, and it is also increasingly a thing that can be switched off, rationed, or delayed by people who have never seen your codebase.

No inspiration porn, just the unglamorous read: if your work genuinely needs the absolute top of the stack, pay for it. But architect like it can vanish, because for three labs in a row this month it either did or never arrived. For everything else, which is most things, the open-weights stuff isn’t settling anymore. GLM-5.2 is the sixth-smartest model on Earth, it beat the banned American model on the cyber benchmarks that got it banned, and you can download it for free right now. Sit with that one.

The champ’s still in jail, his heirs are either in the same jail or running late, and the model in your downloads folder is doing your coding for a few dollars a million tokens and asking no one’s permission. Pick accordingly. And configure a fallback this time. I keep being the guy who learns that the hard way so you don’t have to.

Top comments (0)