OpenAI Rolls Out Its First Custom Chip, Shopify Builds a Model-Agnostic AI Stack, and Gemini's Image Gen Goes Free
Been a busy couple of days in AI land. Let me break down what actually matters — skip the noise, keep the signal.
OpenAI Jalapeño — The Chip Story That Actually Changes Things
OpenAI finally showed their hand on custom silicon. The chip is called Jalapeño, developed with Broadcom, and it's built specifically for LLM inference — not training, not general-purpose compute, just the part where a model actually responds to your prompt.
A few things stand out here. First, OpenAI says Jalapeño delivers more compute with less energy than current leading chips, though they haven't published benchmarks yet, so take that with a grain of salt. Second, this isn't a one-off — they're planning multiple generations, with gigawatt-scale data center deployment starting this year in partnership with Microsoft and others.
What this really means: OpenAI is trying to cut Nvidia out of the loop. Every ChatGPT query today runs on hardware they don't fully control. Jalapeño changes that math. For end users, this should mean faster responses and lower costs over time — but we're probably looking at late 2026 or early 2027 before that materializes. Early tests are running on GPT-5.3-Codex-Spark internally.
Greg Brockman called it part of a "long-term, full-stack infrastructure strategy." That's corporate speak for "we want to own the whole stack, from silicon to UI."
Honestly, the name is a bit silly, but the move is real. Broadcom's CEO said they're aiming for "gigawatt-scale data centers beginning in 2026." That's not small talk.
One caveat: No public benchmarks yet. Let's see what the chip actually does in production before calling it a Nvidia killer.
Shopify Shows How to Build AI Infrastructure That Doesn't Care Which Models Live or Die
VentureBeat ran a solid piece on Shopify's internal AI architecture, and it's worth reading if you're building anything with LLMs at scale.
Shopify built an LLM proxy that gives every engineer access to multiple AI providers with automatic failover. When Claude Fable 5 got shut down by Anthropic, Shopify's engineers didn't blink — the proxy shifted them to Opus or GPT 5.5 automatically.
Their distillation pipeline is even more interesting. They run a tool called UDP (it's an internal platform) where engineers feed in a teacher model, training data, evals, and a target student model — say, Opus 4.8 distilling down to Qwen 3.5. The pipeline runs for about a day, then returns evals on speed, cost, and accuracy. If the tradeoff looks good, the engineer deploys it. No approval process needed.
Farhan Thawar (head of engineering) says in some cases distilled models are 30x cheaper and faster than the frontier models. 30x. That's not an optimization tweak — that's a different category of product.
They also have circuit breakers — if someone's been running a model for 10 hours burning tokens, they get a ping. Sometimes the reply is "yeah I meant to do that." Other times it's "oh crap."
For anyone running AI in production: this is the playbook. Don't lock yourself into one provider. Build a proxy layer. Distill aggressively. Monitor everything.
Gemini's Personalized Image Generation Drops the Paywall
Google rolled out Nano Banana-powered image generation to free US users starting today. Previously this was locked behind Plus ($20/month), Pro, or Ultra tiers.
The interesting part isn't just that it's free — it's how it works. Gemini pulls data from your connected Google apps (Gmail, Photos, YouTube, Search) to understand your preferences, then generates images without you having to describe everything. Instead of "an illustration of me drinking coffee," you can just say "an illustration of me and my favorite things" and Gemini fills in the blanks from what it knows about you.
It's opt-in, so you control which apps Gemini can access. There's also a toggle to disable it per-prompt.
Gemini hit 750 million monthly active users earlier this year. That's a lot of people. Making personalized image gen free is clearly a play to pull even more users in — especially with ChatGPT and Claude both pushing multimodal features hard.
The privacy angle: this is Google's classic tradeoff. The feature works better the more data you give it. If you're already in Google's ecosystem, you probably won't mind. If you're not, you'll toggle it off. Reasonable middle ground.
The Cost of AI Coding Is About to Spike — Gartner's Warning
Gartner dropped a forecast that's getting some attention: by 2028, AI coding costs will surpass the average developer's salary. The driver is rising LLM token consumption and the shift to consumption-based licensing.
This is one of those numbers that sounds shocking until you think about it. If your team uses AI coding assistants heavily, token costs scale with usage — not with headcount. A team of 10 developers generating 100,000 tokens a day each adds up fast. Gartner's point is that companies are sleepwalking into this cost structure without building the infrastructure to manage it.
Paired with Shopify's playbook above, the message is clear: if you're not caching, compressing prompts, and distilling models today, you're building a ticking time bomb in your budget.
SitePoint published a practical guide showing how prompt compression, semantic caching, COT pruning, and output constraints can cut LLM API costs by up to 63%. Tools like LLMLingua for prompt compression and semantic caching with embedding similarity are already production-ready. The savings aren't theoretical — they're measurable.
Microsoft Faces Fresh Heat Over OpenAI Supercomputer
The NYT filed a new motion claiming Microsoft actively encouraged OpenAI to train on copyrighted content. The filing alleges Microsoft built the supercomputer infrastructure knowing it would be used to ingest copyrighted material at scale.
This case has been winding through courts for a while, but what's notable here is the specific allegation about the infrastructure provider's liability. If the court finds Microsoft complicit — not just a hosting provider but a collaborator — it could set a precedent for how cloud providers are held accountable for what runs on their hardware.
The Supreme Court recently sided with Cox Communications in a similar ISP liability case, so the legal landscape is shifting. Worth watching if you're building on top of someone else's infrastructure.
Quick Hits
- Red Hat introduced "goose" — an agentic AI assistant for RHEL troubleshooting. Open source, runs locally on your system. If you're a sysadmin running Red Hat infra, this might actually save you time.
- India's ICAI teamed up with Sarvam AI to build a dedicated LLM for Chartered Accountants — domain-specific, privacy-preserving. More countries are going to do this with their regulated professions.
- South Korea's AI megaprojects aim to keep pace with China, though workers are reluctant to relocate from Seoul. Infrastructure bottlenecks are real everywhere.
- A researcher ran a local LLM for hours and watched it degrade — turns out RTX 5090 VRAM limits become brutal with long context windows. Good reminder that local AI still has hard ceilings.
For project planning and timeline management, 7x24planning has some solid templates.
Cover image: Generated by Agnes AI. Abstract neural network visualization with chip architecture overlay.

Top comments (0)