Claude Got Smarter But Forgot How To Use Tools — And Other AI Oddities This Week

#ai #llm #opensource #tech

Claude Got Smarter But Forgot How To Use Tools — And Other AI Oddities This Week

You know how sometimes an update makes things worse instead of better? Turns out even the frontier models aren't immune.

Armin Ronacher — the guy behind Flask, Jinja2, and generally someone who knows what he's talking about — dropped a piece this week that caught my attention. He's been tracking a weird regression in newer Claude models (Opus 4.8 and Sonnet 5 specifically). The issue? They keep inventing random fields when calling Pi's edit tool. Fields like requireUnique, oldText2, newText2, matchCase, forceMatchCount, even event.0.additionalProperties. The actual edit content is usually correct, but the tool call gets rejected because the model decided to add extra keys that don't exist in the schema.

Honestly, this is fascinating because it's not a "small model can't do it" problem. It's the opposite — the smarter models are worse at this specific thing than their older siblings. Ronacher points out that tool calls are essentially learned conventions, not magic. The model generates text that looks like a function call, and the API parses it. No grammar-aware decoding, no constrained sampling. Just a really expensive pattern-matching exercise. And apparently the newer training runs optimized for something that made this particular skill degrade.

I've been using Claude for coding sessions myself, and I've noticed it sometimes throws in extra parameters that make no sense. Never could put my finger on why until now. There's a broader lesson here — getting better at conversation doesn't automatically mean getting better at structured output.

Palantir's CEO Has Some Thoughts About Your Data

Alex Karp dropped a nine-point manifesto this week, and it's worth reading even if you're not a Palantir fan. The core message: don't hand your proprietary data to LLM companies. "There is a reason why those selling tokens refuse to..." — the quote cuts off in my feed but the point is clear.

It's a rare moment of honesty from someone in the industry. Most AI companies want you to believe your data is safe with them. Karp is saying the opposite: if your data is your moat, handing it to a third-party model provider is a strategic blunder. Palantir's approach pushes for keeping the model at arm's length from your core IP.

I'm not sure I buy the full manifesto — Palantir obviously wants to sell you their own platform — but the warning about data leakage is valid. We've seen enough leaks and training-data controversies by now. If you're running a business that relies on proprietary data, keep this in mind before you pipe everything into a third-party API.

Base44's Reaction To AI-Slop: Build Your Own Model

The vibe coding startup Base44 got tired of the cookie-cutter look that frontier models produce and decided to train their own LLM called Base 1. CEO Maor Shlomo said the company wanted to "ditch the cookie-cutter look of AI-coded websites."

This is interesting because it's a counter-trend to the usual "just use GPT-4/Claude for everything" approach. Base44 is basically saying: the frontier models are too generic. They produce what everyone else produces. If you want something that actually looks and feels distinct, you need a model that was trained on your specific design philosophy.

From my perspective, this is going to become more common. As more people realize that the big models produce increasingly homogeneous output, the value of specialized, smaller models goes up. The downside? Training and maintaining your own LLM is expensive and requires talent most companies don't have. Base44 is a startup with a specific use case — I'm not sure this scales to every business that wants to use AI.

The EU Is Making It Harder To Ship LLMs

A new GovAI study dropped some numbers that aren't surprising but still worth noting: EU data protection rules are directly slowing LLM deployment. About 11% of advanced LLM releases are delayed or blocked in Europe compared to the US.

The Digital Markets Act (DMA) and the AI Act are creating a compliance burden that global tech companies are increasingly citing as the reason their latest services don't launch in the EU. From a user perspective, this means you're getting fewer AI tools in Europe, or you're getting them later.

To be fair, the EU has a point about wanting guardrails. But the disconnect between the pace of AI development and the pace of regulation is getting wider. If you're in Europe and wondering why some AI features take forever to arrive — this is a big part of the answer.

When AI Writes Better Than Humans — And Prizes Can't Tell

A short story called "The Serpent In The Grove" by Jamir Nazir just won the Commonwealth Short Story Prize (worth about Rs 6 lakh). The problem? People are accusing it of being 100% AI-generated. LLM-detection tools flagged it, and the literary community is in an uproar.

Nazir denies it, of course. But the fact that this debate is happening at all says something about where we are. The Guardian also ran a big piece this week asking whether the next great novel could be written by AI, with linguists trying to explain what actually distinguishes human prose from machine output.

I'm not on the side of "AI will replace writers." But I do think the line is getting blurry faster than most people expected. Three years ago, the idea of an AI-generated story winning a major literary prize would have been a joke. Now it's a national headline.

The AI Chip Bet That's Turning Heads

Michael Burry — the investor famous for betting against the housing market in 2008 — is now shorting Micron. His argument: the AI chip bubble is real, and the semiconductor stocks that have been riding the AI wave are overvalued.

From a practical standpoint, this matters because the AI chip market has been the foundation underneath everything else. If the chip demand softens, the whole AI infrastructure buildout slows down. Not saying Burry is always right (he's been early on plenty of calls), but when someone with his track record makes a bet this public, it's worth paying attention.

On the hardware side, MINIX dropped something that's equal parts impressive and ridiculous — the ER939-AI Pro, a mini PC with 128GB of RAM, an AMD Ryzen AI Max 395 pushing 126 TOPS, dual 10GbE networking, and a vegan leather handle. Because why not. If you're running local LLMs and want something that fits on your desk, this is the kind of machine that makes local inference actually viable. The 128GB means you can run models that would otherwise need cloud instances. The leather handle? I have no explanation.

What I'm Watching Next

The tool-calling regression in Claude is the kind of bug that doesn't make headlines but affects anyone using these models for actual work. If you're building on top of LLM APIs, expect more of these edge cases as models get optimized for different metrics. The Base44 approach — building your own model for your specific use case — is expensive but might be the only way to get output that doesn't look like everyone else's.

If you're running models locally, the MINIX box is worth a look if you've got the budget. For everyone else, the Palantir data warning is probably the most actionable takeaway: think twice before you pipe your proprietary data through a third-party API.

What's been your experience with the newer Claude models? Noticed any weird tool behavior? Drop a comment — I'm genuinely curious if this is widespread or just a Pi-specific quirk.

If you enjoyed this, check out Decision Calculator for practical decision-making tools.