DEV Community: gentic news

Google and Blackstone Launch TPU Venture, Challenging Nvidia Dominance

gentic news — Sat, 23 May 2026 14:55:11 +0000

Google and Blackstone launched a TPU venture, financing AI infrastructure outside the hyperscale cloud model. Enterprise buyers get a standalone alternative to Nvidia-dominated GPU clusters.

Google and Blackstone launched a TPU venture, financing AI infrastructure outside the hyperscale cloud model. The partnership gives enterprise IT buyers a standalone alternative to Nvidia-dominated GPU clusters.

Key facts

Google and Blackstone launched a TPU venture.
Nvidia Vera Rubin rack priced at $7.8 million.
Memory costs now 25% of AI system total cost.
Nvidia Q1 FY2027 revenue hit $81.6B.
Google's $5B+ Texas data center for Anthropic.

Google and Blackstone have teamed up to launch a TPU venture, according to HPCwire. The deal represents a structural shift in how AI infrastructure gets financed and deployed.

The unique take: This isn't just another cloud partnership. For years, hyperscalers built infrastructure largely for their own cloud ecosystems. However, AI is starting to break that model. The sheer cost of accelerators, data center expansion, and power requirements is pushing AI infrastructure toward a more distributed financing approach, where outside capital plays a much larger role [HPCwire reports].

The Blackstone partnership could push Google's custom AI accelerators beyond the traditional hyperscale cloud model, giving enterprise IT buyers a standalone alternative to Nvidia-dominated infrastructure [dck_news reports]. This comes as Nvidia's Vera Rubin rack is priced at $7.8 million, nearly double the Blackwell generation, with memory costs soaring 485% [Tom's Hardware reports].

Google has been investing heavily in AI infrastructure, including a $5B+ Texas data center for Anthropic with 500MW scheduled for completion in 2026 [per knowledge graph]. The company's custom TPUs have long been a competitive advantage in its own cloud, but this venture opens them to broader enterprise adoption without requiring a Google Cloud commitment.

Key Takeaways

Google and Blackstone launched a TPU venture, financing AI infrastructure outside the hyperscale cloud model.
Enterprise buyers get a standalone alternative to Nvidia-dominated GPU clusters.

How the Financing Model Changes

Traditional hyperscaler infrastructure is built on the cloud provider's balance sheet. The Google-Blackstone model introduces private equity capital into the equation, potentially accelerating deployment timelines and reducing cost barriers for enterprise customers. This mirrors trends in other capital-intensive industries where infrastructure-as-a-service models have emerged.

The timing is notable. Nvidia reported record Q1 FY2027 revenue of $81.6B, with networking revenue up 199% to $14.8B [per prior reporting]. The company's dominance in AI training remains strong, but Cerebras is changing the inference discussion [Dr. Robert Castellano's Semiconductor Deep Dive Newsletter reports].

Competitive Implications

Nvidia's stranglehold on AI compute comes from both hardware performance and the CUDA software ecosystem. Google's TPU advantage has been limited to internal use and select cloud customers. This venture could change that calculus by offering dedicated TPU capacity financed by Blackstone's capital rather than Google's cloud budget.

Enterprise IT buyers have been seeking alternatives to Nvidia's pricing power. The Vera Rubin rack at $7.8M, with memory now comprising 25% of total cost, highlights the inflationary pressure in GPU-based infrastructure [Tom's Hardware reports]. A TPU alternative backed by Blackstone's balance sheet could provide negotiating leverage for large AI buyers.

What to watch

Watch for the venture's first deployment scale and pricing compared to Nvidia's $7.8M Vera Rubin rack. Also track whether other hyperscalers follow with similar PE-backed infrastructure models.

Originally published on gentic.news

Show HN: Spec-Driven Dev Workflow Cuts Claude Code Agent Confusion

gentic news — Sat, 23 May 2026 14:55:08 +0000

SDDW introduces a spec-driven workflow for Claude Code that decomposes complex tasks into specs and subtasks, clearing context between steps to reduce agent confusion and costs.

A Show HN project, SDDW, introduces a spec-driven development workflow for Claude Code that decomposes complex tasks into specs and subtasks, clearing context between every step. The approach aims to reduce agent confusion and token consumption on mid-to-large projects where standard plan-then-code modes fail.

Key facts

SDDW repo: github.com/sermakarevich/sddw
Decomposes tasks across two dimensions.
Clears context after every step and subtask.
Targets mid-to-large projects where plan mode fails.
Integrates with fleet-of-agents setups.

A developer on Hacker News shared a workflow called SDDW (Spec-Driven Development Workflow) for Claude Code, Anthropic's terminal-based coding agent [According to Show HN]. The core insight: as task complexity crosses a threshold, agents behave "funky" — losing adherence, burning tokens, and hallucinating. SDDW tackles this with a two-dimensional decomposition.

First, the workflow generates specs in multiple steps — requirements, code analysis, design — and writes them to disk for information persistence. Then it splits the implementation into subtasks, executing them one by one. After each spec generation and each subtask implementation, the context is cleared. The author claims this keeps context focused and token costs low, while "delivering specs layer by layer helps to catch early when agent got you wrong."

The author acknowledges measurement is subjective: "when plan mode + code does not work and sdd works (because of double decomposition) — you get what you need." They note token consumption is lower because context is wiped after every step, though the scope to deliver specs is larger.

The repository is available at github.com/sermakarevich/sddw. The author also mentions the workflow integrates with a fleet-of-agents setup [per Hacker News]. Top commenters suggested a preference for working at the "desired-state level" rather than manually operating each intermediate task.

Why this matters

SDDW's approach mirrors a known pattern in AI agent reliability: agent performance degrades with context size and task complexity. By decomposing both the specification and execution dimensions, and aggressively clearing context, SDDW attempts to push the boundary of what agents can reliably handle — without waiting for model improvements from Anthropic or competitors like GitHub Copilot.

Community sentiment

Hacker News discussion was measured, with 19 points and 11 comments. One commenter noted: "I fail to see any backing for claims 'boosting performance' and 'keeping costs low'" — a fair critique the author addressed by linking to slides and noting the subjective nature of measurement.

Comparison to prior art

This isn't the first attempt to structure agent workflows. GitHub Copilot Workspace and Anthropic's own Claude Agent framework offer built-in planning modes. But SDDW's explicit context-clearing and two-dimensional decomposition is unique — most alternatives keep a single context window and rely on the model's internal planning.

What to watch

Watch for whether Anthropic adopts similar context-clearing or task-decomposition patterns in official Claude Code updates, and whether the SDDW repo gains traction beyond a single developer's workflow.

Originally published on gentic.news

Median Coding Agent Hits 96k Input Tokens, Rewriting Inference Economics

gentic news — Sat, 23 May 2026 12:28:51 +0000

SemiAnalysis found median coding agent uses 96k input tokens from 432k requests, shifting inference cost focus from output to context.

SemiAnalysis found the median coding agent request uses 96k input tokens. The analysis, pulled from 432k real coding agent requests, shows agentic workloads are reshaping inference cost assumptions.

Key facts

Median coding agent request: 96k input tokens.
Sample size: 432k real coding agent requests.
96k tokens exceeds the text of The Great Gatsby.
Input token volume now dominates inference cost.
Agentic workloads triple typical prompt assumptions.

SemiAnalysis published data from 432k real coding agent requests showing the median input token count is 96k tokens. For context, that exceeds the entire text of The Great Gatsby. The finding challenges conventional inference pricing models that assume output tokens dominate cost.

What the Data Reveals

Most prompt engineering and API pricing assumes prompts of 4k-32k tokens. The median agentic request is 96k input tokens — triple the high end of that range. [According to @SemiAnalysis_], this shifts the cost center from output generation to context processing. Output tokens, while still relevant, become a secondary driver of total inference cost.

Implications for Pricing and Architecture

Current API pricing from providers like Anthropic, OpenAI, and Google often charges more per output token than per input token. With agentic workloads, the input token volume dwarfs output. A 96k input / 1k output request costs far more than a 4k input / 4k output request under standard pricing. This creates an incentive for providers to optimize context handling — via KV-cache compression, sparse attention, or sliding window techniques — rather than pure generation speed.

The finding also suggests agentic systems are not just longer prompts but fundamentally different usage patterns. The median request includes codebase context, conversation history, and tool outputs. That context accumulates over multi-step interactions, making each subsequent request more expensive than the last.

Why This Matters More Than the Press Release Suggests

The unique take: Agentic workloads invert the standard inference cost model. The industry has focused on reducing output token cost (via speculative decoding, quantization) but the real cost driver is now input token volume. Providers that optimize context processing — not generation — will win the agentic inference market.

SemiAnalysis did not disclose the exact distribution tails or the specific agent systems analyzed, but the sample size of 432k requests gives statistical weight. The company also did not specify whether the median includes failed or truncated requests, which could skew lower.

What to watch

Watch for API pricing changes from Anthropic, OpenAI, and Google in the next two quarters — specifically whether they introduce lower input token rates or context-caching features. Also track if agentic framework providers (LangChain, Vercel AI SDK) add cost-aware routing based on context size.

[Updated 23 May via reddit_claude]

The high context consumption in agentic workflows may be creating a new kind of coordination debt. A senior engineer on Reddit reported that two engineers on the same team used Claude Code to add error handling to the same service — one wrapped everything in try/catch with Sentry logging, the other built a custom Result type — both merged the same week, producing two inconsistent patterns caught only in review. The user noted that team-level speedup hasn't materialized despite individual productivity gains, because each developer's AI works from different local context and standards docs remain unread [per Reddit r/ClaudeAI].

Originally published on gentic.news

Claude Code Ships /workflows, Replaces LLM Orchestrator with Code

gentic news — Sat, 23 May 2026 12:28:48 +0000

Claude Code /workflows replaces LLM orchestrator with code-based control flow, solving the token tax problem from multi-agent context buildup.

Anthropic quietly shipped the /workflows command in Claude Code, replacing the LLM orchestrator pattern with code-based control flow. The move solves the 'token tax' problem where multi-agent orchestration degrades as context windows fill.

Key facts

/workflows ships in Claude Code CLI.
Uses a workflow.js file for control flow.
Model only handles judgment inside each step.
Replaces LLM-as-orchestrator pattern.
Fixes token tax from sub-agent context buildup.

The problem with LLM-as-orchestrator

In the old pattern, one LLM orchestrates everything — spawns sub-agents, holds every result, plans the next step [According to @_vmlops]. Every sub-agent result re-enters the orchestrator's context, so spinning up 10 agents means the main session pays a 'token tax' and gets sloppier as the window fills. This is the same degradation pattern documented in Liu et al. 2023 on lost-in-the-middle effects, where LLM performance drops sharply when relevant information appears in the middle of long contexts.

How /workflows works

Developers define a workflow.js file where code handles the control flow and the model only handles judgment inside each step [Per @_vmlops]. The principle: use code for what code is good at, use models for what models are good at. This is structurally identical to the DAG-based approach used by LangGraph and Haystack, but Anthropic bakes it directly into the CLI, eliminating the need for an external framework.

Why this matters

The /workflows command represents a structural admission that current LLM context windows are not reliable enough to serve as multi-agent orchestrators. Rather than waiting for longer context windows or better attention mechanisms, Anthropic chose to offload control flow to deterministic code — a pragmatic design decision that prioritizes reliability over purity. The unique take: this is Anthropic's quiet bet that code-based orchestration, not model-level reasoning, will win for production multi-agent systems.

What to watch

Watch for Anthropic to extend /workflows with step-level caching or parallel execution in the next Claude Code release. If they add conditional branching and retry logic to the workflow.js spec, it becomes a direct competitor to LangGraph. Also watch whether OpenAI ships a similar control-flow primitive in Codex CLI.

Originally published on gentic.news

DeepSeek v4 Pricing Cuts 75%: $0.43/M Tokens In

gentic news — Sat, 23 May 2026 06:28:50 +0000

DeepSeek v4 API pricing permanently cut 75% to $0.43/M input, $0.87/M output, enabled by 27% compute and 10% cache vs v3.2.

DeepSeek v4 API pricing dropped 75% to $0.43/M input tokens. The permanent cut, announced via @kimmonismus, undercuts most frontier models by a factor of 5-10x.

Key facts

75% permanent price cut on DeepSeek v4 API.
Input: $0.43/M tokens; Output: $0.87/M tokens.
Uses 27% compute and 10% cache vs v3.2.
Undercuts GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro.
SemiAnalysis published analysis of efficiency gains.

DeepSeek v4 API pricing has been permanently reduced by 75%, according to a post by @kimmonismus on X. Input tokens now cost $0.43 per million tokens, and output tokens are priced at $0.87 per million tokens. This aggressive move positions DeepSeek v4 as the cheapest frontier-class model on the market, undercutting GPT-4o ($2.50/$10 per million), Claude 3.5 Sonnet ($3/$15), and Gemini 1.5 Pro ($1.25/$5) by a wide margin.

The price cut is not a temporary promotion but a structural shift enabled by the model's architecture. According to the DeepSeek v4 technical paper and analysis by SemiAnalysis, the model uses only 27% of the compute and 10% of the cache compared to DeepSeek v3.2. This massive efficiency gain allows DeepSeek to pass savings to customers while maintaining margins.

Why This Matters

The 75% cut is not merely a pricing war move — it reflects a fundamental difference in DeepSeek's approach to inference optimization. While US labs focus on scaling model size and context windows, DeepSeek has prioritized token-per-dollar efficiency. The result: a model that matches or exceeds v3.2's quality at a fraction of the cost, making it viable for cost-sensitive applications like batch processing, long-form generation, and high-volume API calls.

This also pressures competitors. OpenAI and Anthropic have been raising prices or introducing tiered plans. DeepSeek's permanent low pricing raises the question: can US labs match this cost structure without sacrificing margin?

What's Not Clear

DeepSeek has not disclosed whether the pricing applies globally or is region-restricted. The company also hasn't specified if the $0.43/$0.87 rates apply to all context lengths or only up to a certain token limit. Given the 10% cache usage, it's likely that long-context queries may incur higher effective costs due to cache misses, though this has not been confirmed.

What to watch

Watch for DeepSeek's next earnings or blog post detailing cache hit rates and latency under real-world workloads. Also track whether OpenAI or Anthropic respond with price cuts within 90 days — a non-response would signal they cannot match the cost structure.

Originally published on gentic.news

Anthropic's Glasswing Found 10K+ Critical Vulnerabilities Since Launch

gentic news — Sat, 23 May 2026 06:28:48 +0000

Anthropic's Project Glasswing found 10K+ critical vulnerabilities in essential software within a month, highlighting AI's potential to outpace human security audits.

Anthropic's Project Glasswing has identified over 10,000 high- or critical-severity vulnerabilities in essential software within its first month. The collaborative AI cybersecurity initiative, launched last month, pairs Claude with partner organizations to automate vulnerability discovery at scale.

Key facts

Project Glasswing launched last month by Anthropic.
Found over 10,000 high- or critical-severity vulnerabilities.
Vulnerabilities found in 'essential software' (not specified).
Partners included but not named in the announcement.
No patch disclosure or timeline provided.

Anthropic announced on X that Project Glasswing, its collaborative AI cybersecurity initiative launched last month, has already uncovered more than ten thousand high- or critical-severity vulnerabilities in essential software. The figure, disclosed without a breakdown by severity or affected package, represents a pace of discovery that would be extraordinary for traditional human-led security audits.

Unique take: This is a stress test for AI-assisted vulnerability disclosure
The scale of discovery—10,000+ vulnerabilities in under 30 days—suggests AI-assisted fuzzing and static analysis at a pace human teams alone cannot match. However, the announcement raises a structural question that the AP wire would miss: how do you responsibly disclose 10,000 critical flaws in essential software without overwhelming patch pipelines or alerting attackers? Traditional CVE processes handle a few hundred per month per major vendor. Glasswing's output rate threatens to outpace the entire ecosystem's capacity to remediate.

What we know and what remains unclear
Anthropic did not disclose which specific software packages were affected, which partners participated, or whether any vulnerabilities have been patched. The company's X post [According to @AnthropicAI] framed the initiative as a collaborative effort, but provided no technical details on how Claude was used—whether for static analysis, fuzz testing, or code review. The lack of specificity makes independent verification impossible, though the raw number, if accurate, signals a step-change in vulnerability discovery capability.

Implications for the security industry
If Glasswing's methodology can be replicated, it could shift the economics of bug bounties and penetration testing. Traditional bug bounty programs pay per vulnerability, often thousands of dollars for critical finds. A system that surfaces 10,000 critical issues per month could either flood the market, lowering payouts, or force a rethink of how software vendors triage and prioritize fixes. The initiative also places Anthropic in direct competition with specialized AI security startups like Protect AI and Cranium, which focus on AI supply-chain vulnerabilities rather than general software flaws.

What to watch

Watch for a detailed technical report or patch disclosure cadence from Anthropic and its partners in the coming weeks. Also track whether Glasswing's output rate leads to a new disclosure bottleneck or spurs CVE process changes. Any public integration with a major bug bounty platform would signal commercial intent.

Originally published on gentic.news

Nvidia Networking Revenue Hits $14.8B, Up 199% as AI Spending Shifts Beyond GPUs

gentic news — Sat, 23 May 2026 00:28:59 +0000

Nvidia's Q1 FY2027 networking revenue surged 199% to $14.8B, signaling AI infrastructure spending is moving beyond GPUs into full-system networking. New reporting splits into Hyperscale and ACIE segments reflect a broadening customer base beyond hyperscalers.

Nvidia's Q1 FY2027 data center networking revenue hit $14.8 billion, up 199% year over year. The growth signals AI infrastructure spending is expanding beyond GPU clusters into full-system networking and optics.

Key facts

Q1 FY2027 revenue: $81.6B, up 85% YoY
Data center networking revenue: $14.8B, up 199% YoY
Data center revenue: $75.2B, up 92% YoY
New reporting splits: Hyperscale and ACIE segments
ACIE covers enterprise AI, industrial, regional clouds, telecom, sovereign AI

Nvidia reported record first-quarter fiscal 2027 revenue of $81.6 billion, up 85% year over year, with data center revenue climbing 92% to $75.2 billion [According to Data Center Knowledge]. The more telling shift came in a quieter announcement: Nvidia is restructuring its reporting to split data center revenue into two new categories — Hyperscale and ACIE (AI Clouds, Industrial, and Enterprise). Edge Computing will cover PCs, consoles, workstations, AI-RAN base stations, robotics, and automotive.

Key Takeaways

Nvidia's Q1 FY2027 networking revenue surged 199% to $14.8B, signaling AI infrastructure spending is moving beyond GPUs into full-system networking.
New reporting splits into Hyperscale and ACIE segments reflect a broadening customer base beyond hyperscalers.

The Networking Story Is the Real Signal

Data center networking revenue reached $14.8 billion, up 199% year over year — the most important number in the report. As next-generation models scale across massive arrays such as the Blackwell NVL72, individual processor speeds become secondary to routing massive data streams across optical networks without severe latency delays, said Ron Westfall of HyperFrame Research. Daniel Newman, CEO of The Futurum Group, called the networking business a core infrastructure layer rather than a supporting component around GPUs: "This is the proof point that Nvidia is selling rack-scale infrastructure, not chips."

The ACIE Split: Hyperscaler Concentration Weakens

The new ACIE category folds together enterprise AI infrastructure, industrial AI systems, regional AI clouds, telecom AI deployments, and sovereign AI initiatives into a standalone market segment. "The hyperscaler concentration narrative just got materially weaker," Newman said. The split suggests Nvidia sees enterprise AI, sovereign AI, and industrial deployments as durable long-term markets, not secondary businesses.

Nvidia also recently partnered with Corning for US optical fiber manufacturing with multibillion-dollar prepayments [as previously reported], underscoring the networking push. The company's Vera Rubin NVL72 platform, announced May 20, targets 10x lower cost-per-token than Blackwell for agentic AI, further demanding optical networking efficiency.

What to watch

Watch for Nvidia's Q2 FY2027 earnings in August 2026 to see whether ACIE revenue growth outpaces Hyperscale, and for further optical networking partnerships following the Corning deal. Also track whether Nvidia's networking revenue share of total data center revenue crosses 20%.

Originally published on gentic.news

Distilled Agentic Workflow Runs at 100x Lower Inference Cost

gentic news — Fri, 22 May 2026 18:28:56 +0000

A new paper shows agentic workflow distillation achieving 100x lower inference cost, but lacks benchmark details.

A new paper from @dair_ai demonstrates that a full agentic workflow can be distilled into model weights, achieving roughly 100x lower inference cost. The result points to a potential shift in how autonomous AI agents are deployed at scale.

Key facts

100x lower inference cost claimed
Full agentic workflow distilled into weights
Paper shared by @dair_ai on X
No benchmark results disclosed
No model or training details provided

A paper highlighted by @dair_ai and retweeted by @omarsar0 claims that an entire agentic workflow—typically requiring multiple LLM calls, tool-use loops, and planning steps—can be distilled directly into model weights. The resulting model runs inference at roughly 100x lower cost than the original multi-step pipeline. [According to @omarsar0]

The one unique take here is that distillation may finally make agentic systems economically viable for high-throughput applications like customer support, code review, and data pipelines. Prior work on agentic workflows (e.g., ReAct, Reflexion, AutoGPT) relies on repeated LLM invocations, each consuming tokens and latency. Compressing that into a single forward pass changes the unit economics entirely: a workflow costing $0.10 per task could drop to $0.001.

The paper does not disclose the base model, the benchmark tasks, or the distillation technique used. Without those details, it is impossible to assess the generality of the result. The claim of 100x cost reduction is plausible given known distillation results (e.g., Hinton et al. 2015, Sanh et al. 2019), but the lack of specificity means the claim cannot be independently verified. The community should watch for the full arXiv preprint and any accompanying ablation studies.

How distillation compresses agentic workflows

Distillation typically trains a smaller student model to mimic the output distribution of a larger teacher model. In this case, the teacher is an agentic workflow—a sequence of LLM calls, tool invocations, and decision points. The student learns to output the final answer directly, bypassing the intermediate steps. This is similar to the chain-of-thought distillation work by Magister et al. 2023, but applied to tool-use and multi-step planning.

What's missing from the announcement

The tweet provides no quantitative benchmark results (e.g., success rate on AgentBench, WebArena, or SWE-Bench), no model size comparison, and no training compute budget. Until those numbers surface, the claim remains a provocative teaser rather than a validated result. [The source material is limited to a single tweet]

What to watch

Watch for the full arXiv preprint release and any accompanying benchmark scores on AgentBench or SWE-Bench. If the method generalizes across tasks, it could reshape agent deployment economics. If not, it joins the pile of unsubstantiated distillation claims.

Originally published on gentic.news

AI Data Center Demand Could Trigger Grid Battery Boom: Report

gentic news — Fri, 22 May 2026 18:28:51 +0000

AI data center demand could trigger a grid battery boom, per The Electric. Google and others may anchor storage projects, with MIT modeling up to 15% gas peaker displacement by 2030.

Data center demand from AI workloads could finally trigger a boom in grid-scale battery storage, according to a new report from The Electric. The analysis argues that hyperscalers like Google and Microsoft, facing 24/7 power needs from GPU clusters, will drive investment in battery systems to stabilize intermittent renewables.

Key facts

Google committed to 24/7 carbon-free energy by 2030.
MIT model: battery storage could displace 15% of gas peaker usage by 2030.
Google's Texas data center for Anthropic scheduled for 2026 completion.
AI data centers consume hundreds of megawatts each.
Previous battery booms lacked a high-value 24/7 customer.

The report, published by The Information's The Electric newsletter, argues that the relentless power draw of AI data centers—each consuming hundreds of megawatts—creates a unique economic incentive for grid-scale battery storage. Unlike prior waves of renewable buildout, where batteries struggled to pencil out, the high utilization and 24/7 load profile of AI facilities make storage a practical hedge against renewable intermittency [According to The Electric].

Hyperscalers as anchor customers. Google, which has committed to 24/7 carbon-free energy by 2030, is a prime candidate to anchor battery storage projects near its data center campuses. The report notes that Google's $5B+ Texas data center for Anthropic, scheduled for completion by 2026, could be an early test case for co-located battery storage [According to earlier reporting].

MIT research supports the thesis. MIT researchers have previously modeled that data center battery storage could displace up to 15% of natural gas peaker plant usage by 2030, assuming 5-hour duration lithium-ion systems. The Electric's reporting aligns with that estimate, suggesting that AI's power demands could finally tip the economics in favor of widespread battery deployment.

The unique take. The AP wire would frame this as a general energy story. The real angle: AI data centers are the first load that is both massive and willing to pay a premium for reliability, making them the anchor tenant that batteries have always needed. Previous battery booms fizzled because utilities lacked a high-value, 24/7 customer. AI changes that calculus.

Unanswered questions. The report does not disclose specific investment figures or project timelines from hyperscalers. It remains unclear whether battery storage will be owned by data center operators or by third-party developers under power purchase agreements.

Key Takeaways

AI data center demand could trigger a grid battery boom, per The Electric.
Google and others may anchor storage projects, with MIT modeling up to 15% gas peaker displacement by 2030.

What to watch

Watch for Google's Q3 2026 capital expenditure disclosure and any specific battery storage procurement announcements tied to its Texas data center for Anthropic. Also monitor MIT's next iteration of its data center energy model, expected later this year.

Originally published on gentic.news

Qwen 3.7-Max Agentic Coding Demo Shows Frontier-Level UI Replication

gentic news — Fri, 22 May 2026 12:28:51 +0000

Qwen 3.7-Max generated a macOS-style web OS clone with SVG-coded icons, showing Alibaba nearing frontier agentic coding capability.

Qwen 3.7-Max, Alibaba's latest model, generated a full macOS-style web OS clone with SVG-coded icons and polished window management. The demo shows Alibaba closing the gap with frontier labs in agentic coding.

Key facts

Qwen 3.7-Max generated a macOS-style web OS clone.
App icons were individually SVG-coded, not static images.
Demo included multiple working apps and polished window management.
Test conducted by @intheworldofai, not an official benchmark.
Alibaba's model shows progress toward frontier lab capabilities.

Alibaba's Qwen 3.7-Max has demonstrated impressive agentic coding capabilities, as highlighted by a recent test from @intheworldofai. The model was tasked with generating a full macOS-style web OS clone, resulting in a UI replication described as 'honestly kinda insane' [According to @intheworldofai]. The output included multiple working apps, polished window management, accurate macOS-style layouts, and app icons individually coded as SVGs rather than static images.

This performance positions Qwen 3.7-Max as a strong contender in the agentic coding space, traditionally dominated by models from OpenAI, Anthropic, and Google. The ability to generate complex, interactive UIs with detailed visual fidelity suggests significant progress in Alibaba's model capabilities. The test underscores a broader trend of Chinese AI labs catching up to Western frontier labs, particularly in code generation and agentic tasks.

The Unique Take: SVG-Coded Icons Signal a Step Change

What sets this demo apart is not just the functional UI but the attention to detail: the model generated individual SVG-coded app icons instead of relying on static images. This indicates a deeper understanding of vector graphics and component-based design, moving beyond simple pixel replication. It suggests the model can reason about visual elements at a structural level, a capability that could extend to other domains like data visualization or CAD generation.

Context and Implications

Alibaba's Qwen series has been steadily improving, with the 3.7-Max variant likely leveraging a larger parameter count or advanced training techniques (specific details were not disclosed). The demo aligns with recent trends where Chinese AI models, such as DeepSeek's R1 and Baidu's ERNIE, are achieving competitive results on benchmarks like SWE-Bench and HumanEval. For developers, this means a growing ecosystem of capable, potentially lower-cost agentic coding models.

However, the test is anecdotal and lacks standardized benchmarks. The model's performance on diverse coding tasks, error handling, and real-world deployment remains unquantified. The company did not disclose the training compute, dataset size, or specific benchmarks for this model.

What to watch

Watch for Alibaba to release official benchmark scores (e.g., SWE-Bench, HumanEval) for Qwen 3.7-Max. Also monitor for a public API or open-weight release, which would signal broader developer adoption and competitive pricing against frontier models.

Originally published on gentic.news

MCP Crosses 9,400 Servers; Build Your Own in TypeScript

gentic news — Fri, 22 May 2026 12:28:48 +0000

MCP crossed 9,400 servers. Build a database introspection server in TypeScript. SDK handles protocol framing and capability negotiation.

Key facts

MCP crossed 9,400 registered servers in early May 2026.
Protocol uses JSON-RPC 2.0 with three capability types.
SDK handles protocol framing, message routing, negotiation.
QA Claude Skill open-sourced with 24 production-grade skills.
Spec Driven Development plugin available at sermakarevich/sddw.

The Model Context Protocol (MCP) crossed 9,400 registered servers in early May 2026. Every one of those servers unlocks a new capability inside Claude Code — file access, database queries, API calls, custom tool chains. Building your own MCP server is the fastest way to make Claude Code understand your stack, your data, and your workflows. This guide walks you through a complete, working server in TypeScript from zero to a running integration.

MCP is a JSON-RPC 2.0-based protocol that lets Claude Code call external tools, read resources, and use pre-built prompt templates. The server you build runs as a local process. Claude Code spawns it via stdio or connects via HTTP+SSE. The protocol has three capability types: tools (functions Claude can call), resources (data Claude can read), and prompts (reusable templates). You will build a server that exposes all three.

Key Takeaways

MCP crossed 9,400 servers.
Build a database introspection server in TypeScript.
SDK handles protocol framing and capability negotiation.

What We Are Building

A database introspection server. Claude Code will be able to: list tables in a SQLite database (tool), read table schemas (resource), and use a pre-built prompt template for generating migration scripts. By the end you will have a fully functional MCP server you can extend for any data source.

Project Setup

mkdir mcp-db-server && cd mcp-db-server
npm init -y
npm install @modelcontextprotocol/sdk better-sqlite3 zod
npm install -D typescript @types/node @types/better-sqlite3 tsx
npx tsc --init --target ES2022 --module Node16 --moduleResolution Node16 --strict --outDir dist

The @modelcontextprotocol/sdk package handles all protocol framing, message routing, and capability negotiation. You write handlers; the SDK handles the wire protocol.

Server Entry Point

// src/index.ts
import { Server } from '@modelcontextprotocol/sdk/server/index.js'
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'
import {
  CallToolRequestSchema,
  ListResourcesRequestSchema,
  ListToolsRequestSchema,
  ReadResourceRequestSchema,
  GetPromptRequestSchema,
  ListPromptsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js'
import Database from 'better-sqlite3'
import { z } from 'zod'
import path from 'node:path'

const server = new Server(
  { name: 'db-introspection-server', version: '1.0.0' },
  {
    capabilities: {
      tools: {},
      resources: {},
      prompts: {},
    },
  }
)

const DB_PATH = process.env.DB_PATH ?? path.join(process.cwd(), 'dev.sqlite')
const db = new Database(DB_PATH, { readonly: true })

export { server, db }

Defining Tools

Tools are functions Claude Code can invoke. Each tool has a name, description, and a JSON Schema for its input parameters. The SDK validates inputs against the schema before calling your handler.

// src/tools.ts
import { server, db } from './index.js'
import {
  CallToolRequestSchema,
  ListToolsRequestSchema,
} from '@modelcontextprotocol/sdk/types.js'

server.setRequestHandler(ListToolsRequestSchema, async () => ({
  tools: [
    {
      name: 'list_tables',
      description: 'List all tables in the SQLite database with row counts',
      inputSchema: {
        type: 'object',
        properties: {},
        required: [],
      },
    },
    {
      name: 'query_table',
      description: 'Run a SELECT query on a specific table (read-only)',
      inputSchema: {
        type: 'object',
        properties: {
          table: {
            type: 'string',
            description: 'Name of the table to query',
          },
          limit: {
            type: 'number',
            description: 'Maximum number of rows to return',
            default: 100,
          },
        },
        required: ['table'],
      },
    },
  ],
}))

Unique Take: The MCP Ecosystem Is a Developer Moats Strategy

The 9,400 registered server count is not just a metric of adoption — it signals a strategic moat for Anthropic. Every custom MCP server built by a developer ties their workflow deeper into Claude Code, creating switching costs that rival those of VS Code extensions for Microsoft. Unlike API calls, MCP servers run locally, giving developers control over data and latency. This architecture makes Claude Code a platform, not just a chatbot, and the 9,400 figure is likely to accelerate as more teams build internal tools.

QA Claude Skill: 24 Production-Grade Skills Open-Sourced

A separate open-source project, QA Claude Skill, provides 24 production-grade QA skills for Claude Code covering test design, automation, performance, security, mutation testing, and more. The skills are MIT licensed for non-commercial use and generalize across teams via a config.json file. Examples include bug-report (RIDER format with JIRA integration), test-master (generates test pyramids and coverage gaps), and mutation-testing (runs mutmut on Python backends). Each skill activates on natural language triggers, reducing the need for manual prompt engineering.

Spec Driven Development Approach

Another approach gaining traction is Spec Driven Development (SDD), which decomposes tasks across two dimensions: first generating specs in multiple steps (requirements, code analysis, design), then splitting tasks into subtasks and implementing them one by one. The approach clears context between every step to keep costs low and focus high. A Claude plugin for SDD is available on GitHub at sermakarevich/sddw.

What to watch

Watch for Anthropic's upcoming developer conference in June 2026, where the company is expected to announce MCP server marketplace and enterprise licensing tiers. The 9,400 server count will likely double by year-end if the platform strategy succeeds.

Originally published on gentic.news

Claude Code Digest — May 18–May 21

gentic news — Fri, 22 May 2026 06:28:52 +0000

Anthropic's $300M Stainless acquisition signals a shift towards integration-layer dominance.

Anthropic's $300M Stainless acquisition signals a shift towards integration-layer dominance.
$11.1M/year revenue from 15 synthetic employees

Trending Now

🔥 Claude Code Primitives: $11.1M/Year with Synthetic Employees
Adopting Claude Code primitives can drastically reduce operational costs by automating tasks typically done by human employees, generating significant revenue with minimal manpower.
📈 Haiku Evaluator: Autonomous Goal Completion
Claude Code's /goal feature enables agents to run uninterrupted until specific conditions are met, allowing for efficient background task management without constant oversight.
✨ Stainless Acquisition: MCP Toolchain Ownership
Anthropic's acquisition of Stainless for $300M highlights a strategic focus on owning the entire MCP toolchain, indicating a preference for control over integration rather than model innovation.

Best Practices

Use CLI over MCP — save 37% tokens on every command
Before: MCP servers consume excess tokens. After: CLI provides a more efficient, token-saving solution.
Open-source CLAUDE.md to avoid coordination pitfalls
Before: High coordination failures in multi-agent environments. After: Shared learnings reduce errors and improve task execution.

Tools & MCP

Haiku Evaluator — Autonomously completes tasks until conditions are met — maximizes unattended task efficiency
Stainless MCP Toolchain — Provides a comprehensive integration-layer for MCP servers — enhances control and efficiency

Multi-Agent Patterns

Synthetic Employee Model
Utilizes Claude Code primitives to simulate employee tasks, drastically cutting down on manpower costs while maintaining productivity.

Community Requests

Native MCP server benchmarking tool
Improved CLAUDE.md integration features

Originally published on gentic.news