When a city government says it built a frontier AI model from scratch, you want to believe it. A municipal IT department in Brazil producing something competitive with the best models from Google and Alibaba? That is a great story. The kind that makes you think the AI landscape is actually opening up.
Except it wasn't true.
Last week, the City of Rio de Janeiro published a model called Rio-3.5-Open-397B on Hugging Face. The model came from IplanRIO, the city's IT agency. At 397 billion parameters, it sat in the same weight class as serious frontier models. The announcement got press coverage. People celebrated. A municipal government had joined the open-weight AI race.
Then another AI lab looked at the weights.
What IplanRIO Claimed
The pitch was compelling. IplanRIO said it had built an original 397-billion-parameter model to compete head to head with Alibaba's Qwen 3.7 Plus. And according to the numbers IplanRIO published, it was winning. The model scored higher than Qwen 3.7 Plus on 4 out of 5 coding and reasoning benchmarks.
- Terminal-Bench 2.1: 70.8 vs 70.3
- SWE-Bench Multilingual: 77.0 vs 75.8
- SWE-Bench Pro: 58.1 vs 57.6
- IMOAnswerBench (math reasoning): 89.5 vs 86.0
- MMLU-Pro: 88.0 vs 88.5 (the only one Qwen won)
Those are first-party numbers. IplanRIO published them. Nobody independently verified them.
Still, the story was good enough to get traction. Here was a city in Latin America building frontier AI. Not a Silicon Valley lab. Not a Chinese tech giant. A municipal government. It got attention from AI accounts on X, tech blogs, and YouTube channels covering the latest model releases.
Then Nex-AGI showed up.
The Smoking Gun: The Model Didn't Know Its Own Name
Nex-AGI is a Chinese AI lab that released an open-source model called Nex-N2-Pro in early June 2026. It is a 397-billion-parameter Mixture-of-Experts model built on top of Alibaba's Qwen3.5-397B-A17B base. It is free, Apache 2.0 licensed, and competitive with GPT-5.5 on coding benchmarks.
When Nex-AGI heard about Rio-3.5-Open-397B, they noticed something familiar. So they did what any suspicious model owner would do. They stripped out Rio's custom system prompt and asked the model who it was.
The results were damning.
The model identified itself as "Nex, from Nex-AGI" 79% of the time. It identified itself as "Rio" 0% of the time. Zero. It didn't just pick the wrong name. It recited Nex-AGI's internal organizational backstory word for word.
That is not a hallucination. That is not training data contamination. That is a model that was literally built from Nex-N2-Pro and still has its original identity baked into the weights.
Try to imagine a scenario where your model recites another company's private origin story word for word, and it is a coincidence. You can't. Because it isn't.
The Tensor Evidence: Math Doesn't Lie
The identity test was embarrassing enough. But Nex-AGI went further. They did a tensor-by-tensor comparison of every weight in Rio-3.5-Open-397B against two models: their own Nex-N2-Pro and Alibaba's Qwen3.5-397B-A17B base.
Here is what they found.
Every single weight tensor in Rio's model matches a linear blend of approximately 0.6 Nex + 0.4 Qwen. Not some tensors. Not most tensors. All of them. Across all 60 layers. Across every component of the network, from attention heads to feed-forward layers. The ratio is so consistent that the deviation sits thousands of standard deviations below what you would expect from noise.
To understand why this matters, you need to know how model training works. When you fine-tune a model, the weights shift in complex, non-linear ways. Some weights change a lot. Others barely move. The pattern is messy and specific to your training data. It is mathematically impossible for a training run to produce a perfectly consistent linear blend of two other models across every single tensor.
A consistent element-wise interpolation ratio, on the other hand, is exactly what you get from a merge operation. It is the fingerprint of someone taking two models and averaging their weights together. There are open-source tools that do this. Mergekit is the most popular one. You can run a merge with a few lines of YAML config:
models:
- model: nex-agi/Nex-N2-Pro
parameters:
weight: 0.6
- model: Qwen/Qwen3.5-397B-A17B
parameters:
weight: 0.4
merge_method: linear
dtype: bfloat16
That is roughly what someone did to create "Rio-3.5-Open-397B."
Model merging is legitimate. It is widely used in the open-source community. Researchers blend models to combine strengths, and the results can be genuinely useful. The problem is not the merge. The problem is claiming you built something from scratch when you ran a merge script on someone else's work and slapped your city's name on it.
The Aftermath: "Oops, Wrong Upload"
After Nex-AGI published their findings in a GitHub issue, the story spread fast. It hit the front page of Hacker News with over 250 points. YouTube tech channels covered it. Blogs wrote it up within hours.
IplanRIO's response was underwhelming.
They updated their Hugging Face page with a note that reads: "We detected an incorrect upload in the previous version, where the base merged version was uploaded instead of the final distilled model. We are sorry for the confusion and apologize profusely."
Translation: "We accidentally uploaded the merged model instead of the real one."
But here is the problem. The current Hugging Face page still lists the model creation method as:
Built via: Merge of
nex-agi/Nex-N2-ProandQwen/Qwen3.5-397B-A17B
So either the "incorrect upload" is still up, or the "real" model is also a merge. Neither option looks good.
And those benchmark numbers that showed Rio beating Qwen 3.7 Plus? They were published alongside the merged model. So the "breakthrough" performance came from a blend of two models that were already good on their own. Not from original training.
At the time of writing, IplanRIO has not provided a detailed public response. No explanation of how the "incorrect upload" happened. No documentation of their actual training methodology. No compute usage logs. No training data description.
Why This Keeps Happening
You might be thinking: okay, a city government got caught overselling its AI work. Why should I care?
Here is why. This is not an isolated incident. The past two years have seen a steady stream of AI announcements from governments, universities, and companies that turned out to be something other than what was advertised.
Fine-tunes presented as original models. API wrappers marketed as proprietary AI. Merges sold as breakthroughs. The pattern repeats because the incentives are broken. AI is the hottest topic in tech right now. Funding follows AI announcements. Press coverage follows AI announcements. Political credit follows AI announcements. The pressure to have something to announce is enormous, and the gap between "we trained a model from scratch" and "we ran mergekit on two existing models" is invisible to anyone who cannot inspect the weights.
When a startup does this, investors bear the risk. When a government does it, the public is on the hook. Public money funds these projects. Public trust is what gets spent when they turn out to be fake.
Rio's case is particularly frustrating because the framing was so good. A Latin American city building frontier AI is a genuinely exciting idea. It would mean the technology is spreading beyond the usual handful of companies in the usual handful of countries. That matters. But you do not get to claim that achievement by merging two models and calling it yours.
The Open-Source Community Is the Only Watchdog That Worked
Here is the part of this story that I find encouraging. Nobody caught this through regulation. No government oversight body flagged it. No journalist dug into the weights. Another AI lab caught it because they recognized their own model's fingerprints.
That only happened because Rio published open weights. Anyone could download the model and inspect it. Nex-AGI could compare tensors. They could remove the system prompt and probe the model's identity. They could publish their methodology for anyone to verify.
If Rio had released this as a closed model behind an API, the deception would have been much harder to prove. You cannot compare weight tensors you cannot see. You can probe behavior through an API, sure. But you cannot prove a merge with the same mathematical certainty.
This is one of the strongest arguments for open weights I have seen in a while. Not because open-source models are inherently better or safer. But because open weights make it possible to verify claims. They turn "trust us, we trained this from scratch" into something you can actually check.
The EU's AI Act includes some provisions around transparency for AI systems. But enforcement is still early, and the rules are vague about what "transparency" means in practice. In the meantime, the open-source community is doing the actual work of holding AI developers accountable. That is both impressive and a little worrying. It should not take another AI lab to catch a government misrepresenting its work.
What Needs to Change
If we want fewer of these incidents, a few things need to happen.
Government AI projects need mandatory disclosure requirements. If a public agency releases an AI model, it should document the base model, the training method, and the compute used. If it is a merge or a fine-tune, say so. There is nothing wrong with building on existing open-source work. There is everything wrong with hiding it.
We need independent benchmarking. Self-reported numbers from the same team that built the model are not reliable. We have seen this pattern too many times. Third-party evaluation, using standardized benchmarks with published methodology, should be the norm, not the exception.
The AI community needs to call this stuff out more often. Nex-AGI did the right thing by publishing their evidence publicly and letting people judge for themselves. More of that, please. Silence lets these claims go unchecked.
Where This Leaves Us
Rio-3.5-Open-397B is still up on Hugging Face. It still has over 112,000 downloads and 288 likes. The model probably still works fine. It is a decent merge of two good models, after all.
But the story of a Brazilian city building frontier AI from scratch? That story is dead. What we got instead is a cautionary tale about what happens when the pressure to announce a breakthrough collides with the reality of how these models are actually built.
The tools to build AI are more accessible than ever. Open base models, fine-tuning frameworks, and merge tools mean that small teams can produce useful models without billion-dollar budgets. That is real. But accessibility cuts both ways. It also means it is easier than ever to fake competence, to merge someone else's work and call it your own innovation.
The good news? The same openness that makes faking easier also makes catching fakes possible. Every published weight tensor is a piece of evidence. Every open-weights model can be inspected, compared, and verified. The next time a government or company announces a "homegrown" AI model, check the weights.
Someone else probably already is.
Top comments (0)