<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sameer Khan</title>
    <description>The latest articles on DEV Community by Sameer Khan (@monkfromearth).</description>
    <link>https://dev.to/monkfromearth</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F422077%2Fc59a6851-ab5b-4629-afc5-f46d01148b32.png</url>
      <title>DEV Community: Sameer Khan</title>
      <link>https://dev.to/monkfromearth</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/monkfromearth"/>
    <language>en</language>
    <item>
      <title>Google TPU 8 vs Nvidia: 8t and 8i Specs Explained</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Wed, 22 Apr 2026 20:50:16 +0000</pubDate>
      <link>https://dev.to/monkfromearth/google-tpu-8-vs-nvidia-8t-and-8i-specs-explained-3i75</link>
      <guid>https://dev.to/monkfromearth/google-tpu-8-vs-nvidia-8t-and-8i-specs-explained-3i75</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; AI is splitting into two economies: training and inference. Training is a handful of hyperscalers spending tens of billions on clusters that run for weeks. Inference is where every app, every agent, and every dollar of revenue actually lives. Google's TPU 8 is the first chip generation to treat that split as the default. It ships as two chips, an 8t for training and an 8i for inference. The 121 ExaFlops number is the headline. The split is the story. The economies that grow from it are the stakes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why did Google split the TPU 8 into 8t and 8i?
&lt;/h2&gt;

&lt;p&gt;Every prior TPU generation has been one chip. So is every Nvidia GPU people argue about. One die, one package, one SKU, rented to you for both the weeks-long training run and the millisecond inference call.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google's TPU 8 broke that pattern.&lt;/strong&gt; The 8t is a training chip: 9,600 of them wired into a single superpod, 121 ExaFlops of compute, 2 petabytes of shared high-bandwidth memory, roughly 3x the pod-level compute of Ironwood. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; The 8i is an inference chip: 288 GB of HBM per chip, 384 MB of on-chip SRAM (3x the previous generation), 19.2 Tb/s of interconnect. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Those are not two SKUs of the same silicon. Those are two different design targets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyg833wtx192j5unw4dfs.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyg833wtx192j5unw4dfs.webp" alt="Training wants bandwidth, shown as a 3x3 grid of interconnected chips with data flowing between them; inference wants memory, shown as a single chip next to a tall terracotta memory block" width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Training wants bandwidth. 9,600 chips have to exchange gradients every step, and the whole run stalls on the slowest link. That is why 8t doubles the interchip bandwidth and Google brags about 97% goodput, which is their way of saying the accelerators are actually computing instead of waiting on the network. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Inference wants memory. A single chip answers a user query in milliseconds, and the bottleneck is how much of the model and the running context fit in HBM without spilling. That is why 8i has 288 GB per chip and 3x the on-chip SRAM. Nothing about that helps training. Everything about it helps agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does the TPU 8i signal about inference workloads?
&lt;/h2&gt;

&lt;p&gt;There is a reason Google framed the 8i around what it calls the "agentic era." An agent is not a one-shot inference call. It is a loop: plan, call a tool, read the result, plan again, call another tool. Sometimes dozens of steps, sometimes hundreds. The model weights stay loaded. The KV cache keeps growing. Memory is not a nice-to-have. Memory is the budget.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffaidj7fk1qgiahcqjloe.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffaidj7fk1qgiahcqjloe.webp" alt="An agent loop with steps plan, call tool, read result, repeat, alongside three bars showing KV cache memory growing from step 1 to step 20" width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;288 GB per chip is not a round number.&lt;/strong&gt; It is the number you pick when you have watched agents thrash HBM and decided to stop pretending 80 GB is enough. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;The performance-per-dollar claim is the tell. Google says 8i is 80% better on that metric than Ironwood and supports roughly 2x customer volume at the same cost. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; Nobody talks about dollars-per-token when training is the bottleneck. They talk about dollars-per-token when the bill is dominated by the inference that happens every time someone asks Gemini to do something. Which it now is, for Google and for everyone else.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://monkfrom.earth/blogs/turboquant-kv-cache-compression" rel="noopener noreferrer"&gt;I wrote earlier&lt;/a&gt; about how TurboQuant compressed the KV cache 6x in software. TPU 8i is the hardware version of the same bet: inference economics now run the conversation, and the team that optimizes for them wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is the universal GPU era ending with Google's TPU 8?
&lt;/h2&gt;

&lt;p&gt;Nvidia's H100 trains your model and serves your model. So does the B200. Nvidia does ship inference-leaning SKUs like the L4 and L40S, but the flagship data-center AI chip is still one die doing both jobs. That is the universal-GPU bet: one chip, two workloads, pay the compromise on both.&lt;/p&gt;

&lt;p&gt;The compromise is real. A training chip spends a lot of silicon on high-bandwidth fabric that an inference chip never uses. An inference chip wants big HBM and big SRAM that a training chip does not need in the same ratio. Force them into one die and you are renting every customer the worst of both worlds.&lt;/p&gt;

&lt;p&gt;Google is the biggest hyperscaler to ship purpose-built training and inference silicon in the same generation. &lt;strong&gt;AWS got there first with Inferentia in 2019 and Trainium in 2021.&lt;/strong&gt; Microsoft followed with Maia. &lt;sup id="fnref2"&gt;2&lt;/sup&gt; Meta has MTIA. The pattern is not Google being weird; it is the industry quietly admitting that the one-size-fits-all GPU was a phase, not a destination.&lt;/p&gt;

&lt;p&gt;Call it what it is. The TPU 8 announcement is a fork in the road for AI silicon. Nvidia has the software moat and the universality. Google, AWS, Microsoft, and Meta have vertical integration and two chips each. The question for the next three years is whether the software moat survives once specialized silicon is 2x cheaper per watt on the workload that actually pays the bill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who wins and who loses as AI splits into two economies?
&lt;/h2&gt;

&lt;p&gt;Once training and inference become different businesses, the winners and losers sort themselves into different columns.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hyperscalers with volume on both sides win.&lt;/strong&gt; Google, AWS, Microsoft, Meta have the scale to justify two purpose-built chips instead of one compromise chip. Every specialized accelerator they ship is a workload they no longer rent from Nvidia. Training stays expensive; inference gets cheaper inside their walls than outside.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nvidia's dominance is challenged, not broken.&lt;/strong&gt; CUDA, NCCL, and two decades of tooling keep training workloads locked in. That is the half of the business that still prints money. Inference is the half that grows faster, and inference is where the hyperscalers are quietly migrating workloads onto their own silicon. The ceiling on Nvidia's growth is now set by how fast TPU, Trainium, and Maia can absorb inference volume.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Foundation model labs that do not own silicon get squeezed.&lt;/strong&gt; Anthropic rents from AWS and Google. OpenAI rents from Microsoft and the Stargate partners. All three of those landlords are building competitive models on the same chips they are renting out. The rent keeps going up and the cross-subsidy is one-way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Startups and app builders live or die on inference economics.&lt;/strong&gt; If you are building on foundation models, your margin is tokens-per-dollar. When hyperscalers drop inference cost 80% on their own silicon, that becomes the floor everyone else has to compete with. The team that ships the cheapest inference at scale becomes the cheapest place to build an app. For builders, that is a feature, not a threat. For anyone reselling Nvidia capacity with a markup, it is a countdown.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Margins move to whoever runs the cheapest inference at scale.&lt;/strong&gt; Training is a capex line item, amortized over the life of a model. Inference is a variable cost on every single request. Whoever controls the variable cost controls the unit economics of the AI industry. That is the prize.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is the TPU 8 interconnect actually falling behind AWS and Microsoft?
&lt;/h2&gt;

&lt;p&gt;A recurring critique on the Hacker News thread was that Google's memory-to-interconnect ratio is slipping. &lt;sup id="fnref2"&gt;2&lt;/sup&gt; Worth taking seriously, and worth checking against the actual numbers, because the commenter had the units confused.&lt;/p&gt;

&lt;p&gt;Here is the like-for-like comparison, all bidirectional per chip:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ironwood (TPU v7):&lt;/strong&gt; 1.2 TB/s (9.6 Tb/s aggregate across four ICI links). &lt;sup id="fnref3"&gt;3&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google TPU 8i:&lt;/strong&gt; 2.4 TB/s (19.2 Tb/s per Google). &lt;sup id="fnref1"&gt;1&lt;/sup&gt; Roughly double Ironwood. Matches Google's "2x interconnect" claim.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Trainium3:&lt;/strong&gt; 2 TB/s on NeuronLink-v4, inside a 144-chip UltraServer. &lt;sup id="fnref4"&gt;4&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Maia 200:&lt;/strong&gt; 2.8 TB/s bidirectional on an integrated on-die NIC. &lt;sup id="fnref5"&gt;5&lt;/sup&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbiminc1n0zaocp4bwvs.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbiminc1n0zaocp4bwvs.webp" alt="Horizontal bar chart comparing interconnect bandwidth per chip: Ironwood 1.2 TB/s, Trainium3 2.0 TB/s, TPU 8i 2.4 TB/s highlighted in terracotta, Maia 200 2.8 TB/s" width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;TPU 8i is not behind the pack. It beats Trainium3 and sits just shy of Maia 200. The "1.2" figure that got circulated was Ironwood, not 8i. Google doubled the number, and the doubling lands them in contention with the chips they are supposed to be losing to.&lt;/p&gt;

&lt;p&gt;The real open question is ratios. Maia 200 ships 216 GB of HBM; TPU 8i ships 288 GB. Bigger memory pools need more bandwidth to drain, and at some point inference workloads start begging for more interconnect. That tradeoff is real. But it is a tuning debate inside a competitive band, not evidence Google has fallen off.&lt;/p&gt;

&lt;h2&gt;
  
  
  How does Google's TPU 8 move the AI moat to silicon?
&lt;/h2&gt;

&lt;p&gt;Step back from the chip. Look at the stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google owns every layer:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fab relationship&lt;/strong&gt; with TSMC&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chip design&lt;/strong&gt; (TPU 8)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interconnect&lt;/strong&gt; (ICI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data centers&lt;/strong&gt; (with custom Axion CPUs)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compiler&lt;/strong&gt; (XLA)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training framework&lt;/strong&gt; (JAX)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serving stack&lt;/strong&gt; (for inference)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt; (Gemini)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product&lt;/strong&gt; (Search, Workspace, Android)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uiatwkybh1t7628s3aw.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uiatwkybh1t7628s3aw.webp" alt="A nine-layer stack showing every layer Google owns, from TSMC fabs at the bottom to Gemini and consumer products at the top" width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When TPU 8 ships, Google's own workloads get the 2x perf-per-watt before anyone else does. And the people who rent Google's TPUs are renting a stack that was optimized end to end by the same company.&lt;/p&gt;

&lt;p&gt;Anthropic leans on AWS and Google Cloud. OpenAI leans on Microsoft and the Stargate partners. The labs with the best models rent their silicon. Google builds its own.&lt;/p&gt;

&lt;p&gt;Now look at what the last twelve months showed us about models. &lt;strong&gt;DeepSeek R1 replicated frontier capability at a fraction of the training cost in January 2025.&lt;/strong&gt; &lt;sup id="fnref6"&gt;6&lt;/sup&gt; Open weights caught up faster than anyone expected. Llama, Qwen, Mistral, DeepSeek, Gemma: the gap between the best closed model and a competent open one keeps shrinking. Models replicate. That is the whole point of software.&lt;/p&gt;

&lt;p&gt;Fabs do not replicate. You cannot fork TSMC. You cannot clone a 9,600-chip liquid-cooled superpod on a weekend. The thing the industry spent two years arguing about, whose model is smartest, turns out to be the part that commoditizes fastest. The thing nobody argues about, whose silicon is cheapest per useful token, is the part that compounds. &lt;a href="https://monkfrom.earth/blogs/openai-122b-what-it-means-for-ai-space" rel="noopener noreferrer"&gt;The $122B OpenAI raised&lt;/a&gt; is mostly going to buy this capacity, not build better models.&lt;/p&gt;

&lt;p&gt;This is the same lesson constraints usually teach. The visible layer changes constantly. The load-bearing layer underneath does not, and whoever owns it wins slowly, then suddenly. Gemini can stay a half-step behind Claude on agentic coding and Google still comes out ahead if the cost to serve is half. Skeptics on the Hacker News thread were right that the model quality gap is real. &lt;sup id="fnref2"&gt;2&lt;/sup&gt; They were arguing about the wrong layer.&lt;/p&gt;

&lt;p&gt;The TPU 8 split is not an engineering footnote. It is the moment Google stopped pretending the moat was the model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI is splitting into two economies.&lt;/strong&gt; Training is capex-heavy and concentrated in a handful of hyperscalers. Inference is where apps, agents, and revenue actually scale. TPU 8 is the first chip generation to treat the split as the default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TPU 8 is two chips.&lt;/strong&gt; 8t for training (9,600-chip pods, 121 ExaFlops, 2 PB HBM). 8i for inference (288 GB HBM, 384 MB SRAM, 19.2 Tb/s interconnect). &lt;sup id="fnref1"&gt;1&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Up to 2x performance-per-watt versus Ironwood&lt;/strong&gt; on both chips; 3x pod compute on 8t; 80% better performance-per-dollar on 8i. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hyperscalers win, Nvidia gets squeezed on inference, labs without silicon pay rent both ways.&lt;/strong&gt; Margins move to whoever runs the cheapest inference at scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The moat is moving to silicon.&lt;/strong&gt; Models replicate (DeepSeek). Fabs and full-stack integration do not. &lt;sup id="fnref6"&gt;6&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;General availability later in 2026.&lt;/strong&gt; Citadel Securities is the first named customer. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are the TPU 8t and TPU 8i?
&lt;/h3&gt;

&lt;p&gt;They are the two chips in Google's eighth generation TPU. The 8t is the training chip, built into 9,600-chip superpods that deliver 121 ExaFlops and 2 petabytes of shared high-bandwidth memory. The 8i is the inference chip, with 288 GB of HBM, 384 MB of on-chip SRAM, and 19.2 Tb/s of interconnect bandwidth per chip. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Google's TPU 8 compare to Ironwood?
&lt;/h3&gt;

&lt;p&gt;Google cites up to 2x better performance-per-watt versus Ironwood and roughly 3x more compute per pod on 8t. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; Logan Kilpatrick from Google framed the headline gain as 2 to 3x depending on workload. &lt;sup id="fnref7"&gt;7&lt;/sup&gt; TPU 8i claims 80% better performance-per-dollar and supports roughly 2x customer volume at the same cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why did Google split training and inference in TPU 8?
&lt;/h3&gt;

&lt;p&gt;Training and inference want different hardware. Training is bandwidth-hungry across thousands of chips running for weeks. Inference is memory-hungry on a single chip running for milliseconds. Ironwood was one chip forced to serve both. TPU 8 admits the compromise was costing money and built two.&lt;/p&gt;

&lt;h3&gt;
  
  
  When will Google's TPU 8 be available?
&lt;/h3&gt;

&lt;p&gt;General availability is planned for later in 2026. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; Citadel Securities is the named early customer in Google's announcement.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I break down things like this on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. Usually shorter, sometimes as carousels. If this resonated, you would probably like those too.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/eighth-generation-tpu-agentic-era/" rel="noopener noreferrer"&gt;Google: Eighth generation TPU for the agentic era&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;&lt;a href="https://news.ycombinator.com/item?id=47862497" rel="noopener noreferrer"&gt;Hacker News discussion of TPU 8 announcement&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;&lt;a href="https://docs.cloud.google.com/tpu/docs/tpu7x" rel="noopener noreferrer"&gt;Google Cloud: TPU7x (Ironwood) documentation&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;&lt;a href="https://aws.amazon.com/ec2/instance-types/trn3/" rel="noopener noreferrer"&gt;AWS: Trn3 UltraServers and NeuronLink-v4&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;&lt;a href="https://techcommunity.microsoft.com/blog/azureinfrastructureblog/deep-dive-into-the-maia-200-architecture/4489312" rel="noopener noreferrer"&gt;Microsoft: Deep dive into the Maia 200 architecture&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;&lt;a href="https://dev.to/blogs/turboquant-kv-cache-compression"&gt;I wrote earlier about KV cache compression and the software side of this same bet&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;&lt;a href="https://x.com/OfficialLoganK/status/2046998392434508143" rel="noopener noreferrer"&gt;Logan Kilpatrick on X: TPU 8 and Gemini&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>infrastructure</category>
      <category>google</category>
      <category>nvidia</category>
    </item>
    <item>
      <title>Claude Design vs Figma, Lovable, v0: What's Different</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Tue, 21 Apr 2026 07:20:09 +0000</pubDate>
      <link>https://dev.to/monkfromearth/claude-design-vs-figma-lovable-v0-whats-different-44mi</link>
      <guid>https://dev.to/monkfromearth/claude-design-vs-figma-lovable-v0-whats-different-44mi</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Figma, Lovable, v0, and Claude Design are not the same tool. They pick different starting points: &lt;strong&gt;the design file, an idea, a component prompt, your codebase.&lt;/strong&gt; Different starting points, different jobs.&lt;/p&gt;




&lt;p&gt;If you have shipped a product, you know the cycle. Brief to designer. Something comes back that does not quite match the brand. Revise. Engineer reinterprets the spec. Revise again. Two weeks later, the thing looks slightly off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Early adopters described cutting that whole cycle to a single conversation.&lt;/strong&gt; One team reported going from a week-long brief-to-code loop to one session. That is the shift worth unpacking, and it gets lost when Claude Design is compared head-to-head with tools solving different problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Do Figma AI, Lovable, and v0 Actually Do?
&lt;/h2&gt;

&lt;p&gt;Each tool has a clear job. The press keeps comparing the wrong jobs.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Figma Make&lt;/strong&gt; (Figma's AI layer): generates designs from prompts inside the Figma canvas. &lt;strong&gt;Starts from the design file.&lt;/strong&gt; &lt;sup id="fnref1"&gt;1&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lovable&lt;/strong&gt;: turns a plain-language description into a full-stack deployable app. &lt;strong&gt;Starts from an idea.&lt;/strong&gt; &lt;sup id="fnref2"&gt;2&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v0 by Vercel&lt;/strong&gt;: generates React and Tailwind components from prompts. Developer-facing, fast. &lt;strong&gt;Starts from a component need.&lt;/strong&gt; &lt;sup id="fnref2"&gt;2&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Design&lt;/strong&gt;: reads your GitHub repo and generates designs shaped by what is already there. &lt;strong&gt;Starts from your production codebase.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Four tools, four starting points.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does Claude Design Do That Figma, Lovable, and v0 Don't?
&lt;/h2&gt;

&lt;p&gt;When you connect a GitHub repo, Claude Design reads your codebase and extracts: &lt;sup id="fnref3"&gt;3&lt;/sup&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tailwind config:&lt;/strong&gt; your spacing scale, breakpoints, color tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global CSS:&lt;/strong&gt; your CSS variables, font stacks, base styles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Font declarations and logo SVGs:&lt;/strong&gt; the visual identity already in your code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Component names:&lt;/strong&gt; the vocabulary your engineers use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What comes out is a design system that already matches what you shipped.&lt;/strong&gt; Not one you configure. The one living in your repo.&lt;/p&gt;

&lt;p&gt;Two designers ran a live side-by-side the day it launched. Same brief to both: redesign a real blog in Readymag style, passed in as a screenshot and a markdown context file. Claude Design produced a layout that tracked the reference. Lovable produced something competent but generic, closer to a WordPress theme than the brand they pointed at. The designer's read: &lt;strong&gt;"designers now can cook."&lt;/strong&gt; Not a replacement, a lever. &lt;sup id="fnref4"&gt;4&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81ni6cqrtnpc2nqzfunc.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81ni6cqrtnpc2nqzfunc.webp" alt="Claude Design interface showing prompt-to-prototype with design system extraction" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You build from there. Prompt to prototype. When the prototype is ready, one instruction passes it to &lt;a href="https://monkfrom.earth/blogs/zuckerberg-back-to-coding-claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, which also reads your codebase. &lt;strong&gt;The loop closes: idea, design, production code, no translation step.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Lovable and v0 aim at different outputs. Lovable gives a greenfield founder a new app. v0 gives a developer a component to paste in. Claude Design gives a team with an existing product something pre-fitted to their repo. &lt;sup id="fnref5"&gt;5&lt;/sup&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Does Starting From Your Codebase Matter?
&lt;/h2&gt;

&lt;p&gt;Different starting points serve different people.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Figma&lt;/strong&gt; treats the design file as the canonical home for a brand. For design teams, that is still true.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Design&lt;/strong&gt; treats the repo as canonical. That fits a different team: one where design intent already lives in Tailwind tokens, CSS variables, and component names.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This matters most to one person: the engineer or PM extending a live product.&lt;/strong&gt; Not building something new. Not exploring from a blank canvas. Extending what is already there, in a way that matches what is already there.&lt;/p&gt;

&lt;p&gt;For that person, starting from the repo removes a translation step. The output is already shaped by the code it will land in. The other tools are not worse at this. They are aimed elsewhere.&lt;/p&gt;




&lt;p&gt;I post breakdowns like this regularly on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. The angle is always what it means for builders, not what the press release says.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Should You Use Each Tool?
&lt;/h2&gt;

&lt;p&gt;Pick by the starting point that matches your job.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Figma&lt;/strong&gt; is the tool for design teams on a shared canvas. Pixel precision, component libraries, review workflows, handoff annotations. Claude Design does none of this. &lt;sup id="fnref5"&gt;5&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lovable&lt;/strong&gt; is the tool when you have no product yet and want idea to deployed app without code. MVP, internal tool, first prototype. Claude Design is not for that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;v0&lt;/strong&gt; is the tool when you need a React component fast and can edit code. Claude Design is not trying to replace that.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Claude Design is aimed at a specific step:&lt;/strong&gt; you have a live product, a new feature to design, and you need something that already matches everything you built. Teams have always solved this with some combination of briefs, design exploration, review, handoff, and engineering interpretation. Claude Design compresses that into a conversation that starts from the repo. Whether that is the right trade depends on the team.&lt;/p&gt;

&lt;p&gt;The broader pattern is familiar. &lt;a href="https://monkfrom.earth/blogs/zuckerberg-back-to-coding-claude-code" rel="noopener noreferrer"&gt;Zuckerberg returning to the codebase after 20 years using Claude Code&lt;/a&gt; is the same story. So is &lt;a href="https://monkfrom.earth/blogs/karpathy-autoresearch-explained-ml-to-marketing" rel="noopener noreferrer"&gt;Karpathy explaining AI workflows to people who do not write code&lt;/a&gt;. &lt;strong&gt;AI is not replacing the work. It is eliminating the translation layers between people who do different kinds of work.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One signal worth noting.&lt;/strong&gt; Mike Krieger, Anthropic's CPO and Instagram co-founder, resigned from Figma's board on April 14, three days before Claude Design launched. He had joined less than a year earlier. The resignation was disclosed to the SEC the same day The Information reported Anthropic was building design tools. &lt;sup id="fnref6"&gt;6&lt;/sup&gt; The adjacency was close enough for the board seat to become untenable, even though the two products are aimed at different jobs.&lt;/p&gt;

&lt;p&gt;The market read the adjacency in real time.&lt;/p&gt;


&lt;blockquote&gt;
&lt;p&gt;Anthropic Labs launched Claude Design, a new product for creating visual assets, prototypes, slides, and one-pagers with Claude.&lt;br&gt;&lt;br&gt;It is rolling out in research preview to Pro, Max, Team, and Enterprise users, powered by Claude Opus 4.7.&lt;a href="https://twitter.com/search?q=%24ADBE&amp;amp;src=ctag&amp;amp;ref_src=twsrc%5Etfw" rel="noopener noreferrer"&gt;$ADBE&lt;/a&gt; &lt;a href="https://twitter.com/search?q=%24FIG&amp;amp;src=ctag&amp;amp;ref_src=twsrc%5Etfw" rel="noopener noreferrer"&gt;$FIG&lt;/a&gt; &lt;a href="https://t.co/5u0TOMSqSW" rel="noopener noreferrer"&gt;&lt;/a&gt;&lt;a href="https://t.co/5u0TOMSqSW" rel="noopener noreferrer"&gt;https://t.co/5u0TOMSqSW&lt;/a&gt; &lt;a href="https://t.co/TblMIEJE4u" rel="noopener noreferrer"&gt;pic.twitter.com/TblMIEJE4u&lt;/a&gt;&lt;/p&gt;— Wall St Engine (@wallstengine) &lt;a href="https://twitter.com/wallstengine/status/2045163733203501378?ref_src=twsrc%5Etfw" rel="noopener noreferrer"&gt;April 17, 2026&lt;/a&gt;
&lt;/blockquote&gt; 
&lt;h2&gt;
  
  
  What Are Claude Design's Limitations Right Now?
&lt;/h2&gt;

&lt;p&gt;Claude Design is a research preview as of April 2026. Real constraints worth knowing before you try it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No multiplayer.&lt;/strong&gt; For a design team on a shared canvas, Figma still wins cleanly. &lt;sup id="fnref5"&gt;5&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token burn is heavy.&lt;/strong&gt; Claude Design runs on Opus 4.7 and is metered separately from your chat and Claude Code usage. Pro is described as "quick explorations, one-off use." One user reported two design sessions consuming 58% of their weekly Pro allowance. &lt;sup id="fnref7"&gt;7&lt;/sup&gt; To use it regularly, you need Max.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototyping-level output, not production polish.&lt;/strong&gt; The design system extraction makes things brand-consistent, but it is not a replacement for a designer's eye on the final layer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Export options are practical&lt;/strong&gt; but limited: PDF, PPTX, standalone HTML, Canva. &lt;sup id="fnref8"&gt;8&lt;/sup&gt; The HTML export is also how the Claude Code handoff closes the loop. Anthropic's own ecosystem, end to end.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  A 5-Question Claude Design Readiness Check
&lt;/h2&gt;

&lt;p&gt;Before you open it, ask these. If you answer yes to three or more, Claude Design fits your workflow today. If not, Figma, Lovable, or v0 is probably the better tool for the job.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Do you already have a shipped product in a GitHub repo?&lt;/strong&gt; Claude Design starts from code that exists. No repo, no extraction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is your design system encoded in Tailwind config, CSS variables, or component names?&lt;/strong&gt; That is what the extractor reads. Design tokens locked in a Figma file alone will not transfer.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are you extending an existing product rather than starting from zero?&lt;/strong&gt; The tool's edge is fit to what is already there. For greenfield work, Lovable or v0 is closer to the job.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Can one person own the design-to-code loop, or does it need multiplayer?&lt;/strong&gt; No shared canvas. If three designers need to work on the same file, Figma still wins.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are you on Max, or willing to rate-limit yourself on Pro?&lt;/strong&gt; Two sessions burned 58% of a weekly Pro allowance. Regular use needs Max.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Three or more yeses and the translation step this tool removes is a real one for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Figma, Lovable, v0, and Claude Design &lt;strong&gt;pick different starting points.&lt;/strong&gt; Different starting points, different jobs.&lt;/li&gt;
&lt;li&gt;Figma treats the &lt;strong&gt;design file as canonical.&lt;/strong&gt; Claude Design treats the &lt;strong&gt;codebase as canonical.&lt;/strong&gt; Neither is wrong; they suit different teams.&lt;/li&gt;
&lt;li&gt;Claude Design's design system extraction reads your Tailwind, CSS, and component names to generate &lt;strong&gt;on-brand output from the first prompt&lt;/strong&gt;, without manual configuration.&lt;/li&gt;
&lt;li&gt;Each tool fits a different starting point: &lt;strong&gt;Figma for collaborative design work, Lovable for greenfield apps, v0 for quick components, Claude Design for extending an existing codebase.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Token burn is real.&lt;/strong&gt; Claude Design is metered separately. Pro is for one-off use. Regular use requires Max.&lt;/li&gt;
&lt;li&gt;Anthropic's CPO &lt;strong&gt;resigned from Figma's board three days before launch.&lt;/strong&gt; Figma's stock dropped 5 to 7% on launch day, a read of the adjacency, not a verdict on either product.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Shipping something where this trade-off matters and want a second read on it? &lt;a href="mailto:hi@monkfrom.earth?subject=Claude%20Design%20take"&gt;Get in touch&lt;/a&gt;. I reply to every thoughtful email.&lt;/p&gt;

&lt;p&gt;I post builder-first takes on AI tooling on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. The kind that skip the hype and go straight to what changes for people who ship. If that is useful, a follow goes a long way.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://www.figma.com/make/" rel="noopener noreferrer"&gt;Figma Make: AI-powered design tools&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;&lt;a href="https://www.nocode.mba/articles/lovable-vs-V0" rel="noopener noreferrer"&gt;Lovable vs v0: Which AI Builder Is Better? nocode.mba&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;&lt;a href="https://support.claude.com/en/articles/14604397-set-up-your-design-system-in-claude-design" rel="noopener noreferrer"&gt;Set up your design system in Claude Design, Claude Help Center&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=o4jIKc_DIoM" rel="noopener noreferrer"&gt;Claude Design vs Lovable: live side-by-side comparison, YouTube&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;&lt;a href="https://www.eigent.ai/blog/claude-design-vs-figma-make" rel="noopener noreferrer"&gt;Claude Design vs Figma Make: 2026 AI Design Tool Comparison, eigent.ai&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;&lt;a href="https://techcrunch.com/2026/04/16/anthropic-cpo-leaves-figmas-board-after-reports-he-will-offer-a-competing-product/" rel="noopener noreferrer"&gt;Anthropic CPO leaves Figma's board after reports he will offer a competing product, TechCrunch&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;&lt;a href="https://www.testingcatalog.com/anthropic-launches-claude-design-ai-tool-for-paid-plans/" rel="noopener noreferrer"&gt;Anthropic launches Claude Design AI tool for paid plans, Testing Catalog&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn8"&gt;
&lt;p&gt;&lt;a href="https://www.anthropic.com/news/claude-design-anthropic-labs" rel="noopener noreferrer"&gt;Introducing Claude Design by Anthropic Labs&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>design</category>
      <category>ui</category>
      <category>ux</category>
    </item>
    <item>
      <title>Meta Muse Spark: What Meta Is Actually Betting On</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Fri, 17 Apr 2026 12:33:55 +0000</pubDate>
      <link>https://dev.to/monkfromearth/meta-muse-spark-what-meta-is-actually-betting-on-1794</link>
      <guid>https://dev.to/monkfromearth/meta-muse-spark-what-meta-is-actually-betting-on-1794</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Meta launched Muse Spark on April 8, 2026. Most commentary split into two camps. Meta went closed because Meta won. Meta went closed because Meta lost. Both miss what Meta actually built. Muse Spark does frontier-class reasoning in &lt;strong&gt;less than half the output tokens&lt;/strong&gt; Claude Opus 4.6 and GPT-5.4 spend on the same benchmark, and Meta AI, the product serving roughly &lt;strong&gt;three billion daily active users&lt;/strong&gt;, runs on it. Read Muse Spark as an efficiency-first, patiently sequenced, consumer-scale bet, and the choices that look strange on their own start fitting together.&lt;/p&gt;

&lt;p&gt;The week Muse Spark launched, the conversation split almost immediately. One camp said Meta finally caught up and closed the doors. Another said Meta finally fell behind and is hiding it. Both sides were arguing about the license. Neither was arguing about the model.&lt;/p&gt;

&lt;p&gt;The bet Meta actually made isn't captured by the license. It's captured by three choices that are easy to miss through the open-weights lens. Muse Spark is designed for &lt;strong&gt;fewer tokens per query&lt;/strong&gt;. It is framed as &lt;strong&gt;step one of a long sequence&lt;/strong&gt;. And it is shipping first as the engine of a consumer product reaching &lt;strong&gt;three billion daily active users&lt;/strong&gt;. Those three choices, taken together, describe a different game than the one most labs are playing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Muse Spark is
&lt;/h2&gt;

&lt;p&gt;Muse Spark is Meta Superintelligence Labs' first model, shipped April 8 after a nine-month rebuild of Meta's AI infrastructure. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; It is a natively multimodal reasoning model with three modes. Instant for fast responses. Thinking for reasoning-heavy queries. Contemplating, positioned against Gemini Deep Think and GPT Pro for long scientific work. It supports tool use, visual chain of thought, and multi-agent orchestration. &lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Meta AI, the consumer product on meta.ai and the Meta AI app, runs on it today. The Muse Spark API is in private preview for selected partners. Alexandr Wang, Meta's Chief AI Officer, has said broader API access is coming. &lt;sup id="fnref3"&gt;3&lt;/sup&gt; The weights have not been released, and Meta has not committed to whether or when they will be.&lt;/p&gt;

&lt;p&gt;On the Artificial Analysis Intelligence Index v4.0, Muse Spark scores 52. GPT-5.4 and Gemini 3.1 Pro Preview score 57. Claude Opus 4.6 scores 53. &lt;sup id="fnref4"&gt;4&lt;/sup&gt; Fourth at the frontier, as the frontier is currently measured.&lt;/p&gt;

&lt;h2&gt;
  
  
  Efficiency is the number that matters
&lt;/h2&gt;

&lt;p&gt;Meta's headline technical claim is that Muse Spark reaches its capabilities with over an order of magnitude less compute than Llama 4 Maverick, the prior Meta flagship. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; That is a training-side claim. The more interesting number sits on the inference side.&lt;/p&gt;

&lt;p&gt;To complete the Artificial Analysis Intelligence Index v4.0 run, Muse Spark used &lt;strong&gt;58 million output tokens&lt;/strong&gt;. Claude Opus 4.6 used &lt;strong&gt;157 million&lt;/strong&gt;. GPT-5.4 used &lt;strong&gt;120 million&lt;/strong&gt;. &lt;sup id="fnref4"&gt;4&lt;/sup&gt; Muse Spark reaches roughly the same tier of performance while spending less than half the thinking time of its closest competitors.&lt;/p&gt;

&lt;p&gt;Meta calls the mechanism &lt;strong&gt;thought compression&lt;/strong&gt;. During reinforcement learning, the model is penalized for excessive reasoning tokens. It is trained to reach the same answer with fewer intermediate steps. &lt;sup id="fnref4"&gt;4&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Zoom out. Llama 4 Maverick scored &lt;strong&gt;18&lt;/strong&gt; on the same index. Muse Spark scores &lt;strong&gt;52&lt;/strong&gt;. &lt;sup id="fnref4"&gt;4&lt;/sup&gt; &lt;strong&gt;Nearly 3x jump&lt;/strong&gt; in one release, using roughly a tenth of the training compute, producing a model that serves answers in less than half the output tokens of its peers. That is not a fourth-place story. It is a different-axis story.&lt;/p&gt;

&lt;p&gt;Thought compression isn't the only lever. Fei Xia, a Meta researcher, showed Muse Spark tackling a hard visual counting task using parallel subagents: divide the image into a grid, assign a subagent per tile, merge the counts. &lt;sup id="fnref5"&gt;5&lt;/sup&gt; That is a second axis of test-compute scaling. Not fewer tokens per query, but many smaller queries instead of one large one. Both compound efficiency at inference time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhve0hn6bw8c8pt0pztvr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhve0hn6bw8c8pt0pztvr.jpg" alt="Fei Xia, a Meta researcher, used Muse Spark to count birds in a dense flock. The model divided the image into a 4x4 grid, ran parallel subagents per tile, and returned per-tile counts of 24, 50, 46, 11 across the top row, summing to 431 across the frame. A note flags the count as a conservative lower bound because of overlapping birds and sub-threshold specks." width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Matt Ridley, in &lt;em&gt;How Innovation Works&lt;/em&gt;, argues that real technological progress almost never looks like a breakthrough in the moment. It looks like &lt;strong&gt;compounded efficiency&lt;/strong&gt;. &lt;sup id="fnref6"&gt;6&lt;/sup&gt; The Wright brothers didn't fly higher than their competitors; they iterated longer. Meta's claim with Muse Spark is that the same mechanism is back in large language models as the active design constraint. Fewer tokens per query, optimized over releases, compounded.&lt;/p&gt;

&lt;p&gt;Under the efficiency thesis, the contribution is the training recipe, not the weights. The productized result at three billion DAUs is what the recipe is for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Patience as a structural choice
&lt;/h2&gt;

&lt;p&gt;Wang's launch thread called Muse Spark &lt;strong&gt;"step one."&lt;/strong&gt; &lt;sup id="fnref3"&gt;3&lt;/sup&gt; Meta has named three modes, shipped two of them, and placed Contemplating on a published roadmap. The release itself followed a &lt;strong&gt;nine-month rebuild&lt;/strong&gt; of Meta's internal AI infrastructure before any new model went out. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;That pattern is uncommon. Labs announce quarterly, deprecate on shorter cycles, and trade nomenclature every six weeks. A frontier lab committing to a staged ladder with named but unbuilt later steps is the exception.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrphsou62v1tyd0t033t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frrphsou62v1tyd0t033t.png" alt="Muse Spark Contemplating mode benchmark table from Meta's launch. On Humanity's Last Exam (No Tools), Muse Spark scores 50.2, Gemini 3.1 Deep Think 48.4, GPT 5.4 Pro 43.9. With Tools: 58.4, 53.4, 58.7. IPhO 2025 Theory: 82.6, 87.7, 93.5. FrontierScience Research: 38.3, 23.3, 36.7. The ladder's later rungs already produce numbers that compete with Deep Think and GPT Pro." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Jeff Bezos's 1997 shareholder letter made a version of this argument on its own: "We will continue to make investment decisions in light of long-term market leadership considerations rather than short-term profitability considerations." &lt;sup id="fnref7"&gt;7&lt;/sup&gt; Most companies quote the line. Very few behave like it. Muse Spark is Meta behaving like it. A nine-month silence, a named sequence, an efficiency-first architecture that only pays back at scale.&lt;/p&gt;

&lt;p&gt;Patience has a failure mode. If the ladder breaks, the gap widens. If competitors keep improving quarterly and Muse Spark's step two arrives in 2027, the index score will read worse, not better. That is the actual risk of the strategy. Not the license. The cadence.&lt;/p&gt;

&lt;h2&gt;
  
  
  The game Meta is actually playing
&lt;/h2&gt;

&lt;p&gt;Roughly three billion daily active users touch Meta's products. Muse Spark powers Meta AI across them. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; Every prompt, every caption suggestion, every smart reply, every image generation across meta.ai, Instagram, WhatsApp, and Facebook is a query served at Meta's cost.&lt;/p&gt;

&lt;p&gt;Reread the efficiency numbers with that denominator. 58 million output tokens per benchmark run is interesting when you run one benchmark. It is structural when you run hundreds of billions of inferences. Cutting thinking time by more than half is how inference economics actually move at Meta's scale.&lt;/p&gt;

&lt;p&gt;The API is a secondary product. The primary product is a feature inside applications people already use. That framing answers most of the questions that the closed-weights decision seems to raise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Why closed:&lt;/strong&gt; weight distribution gives up the only part that is uniquely Meta, which is distribution plus efficient inference under Meta's control.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why efficiency-first:&lt;/strong&gt; cost-per-query is the load-bearing variable at three billion users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why fourth on the index:&lt;/strong&gt; the index measures capability, not capability per dollar of inference. Meta is not optimizing for the thing the index measures.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why patience:&lt;/strong&gt; product cycles at Meta's scale run in quarters and years, not weeks. A staged ladder matches the cadence of the products that will ship the model.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenAI, Anthropic, and Google primarily sell access. Meta does not. Meta bundles. A closed, efficient model embedded in consumer distribution is a product shape no other frontier lab has a direct answer to right now.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Muse Spark bets against
&lt;/h2&gt;

&lt;p&gt;Muse Spark bets against three premises that have held in AI for three years. That benchmark rank drives strategic outcomes. That fast iteration beats staged iteration. That serving the weights is the dominant form of distribution.&lt;/p&gt;

&lt;p&gt;If Meta is right, competitors re-architect. Expect tokens-per-benchmark to become a reported number. Expect ladder-style release roadmaps. Expect fewer labs selling raw access and more labs selling integrated products.&lt;/p&gt;

&lt;p&gt;If Meta is wrong, Muse Spark stays fourth on the index, the efficiency claim gets normalized by competitors' next releases, and the Scale-era thesis fades into another nine-month rebuild.&lt;/p&gt;

&lt;p&gt;Deedy, in a popular thread after launch, called Muse Spark's reasoning "solid but not best in class." &lt;sup id="fnref5"&gt;5&lt;/sup&gt; That read is fair if you are benchmarking reasoning. It is beside the point if you are measuring how to serve reasoning to three billion people.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency is the headline, not the license.&lt;/strong&gt; Muse Spark uses 58 million output tokens where Claude Opus 4.6 uses 157 million on the same evaluation. &lt;sup id="fnref4"&gt;4&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training efficiency is roughly ten times Llama 4 Maverick.&lt;/strong&gt; The index score nearly tripled in one release. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; &lt;sup id="fnref4"&gt;4&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Patience is the structural bet.&lt;/strong&gt; A nine-month rebuild, a three-mode ladder, a second-step roadmap that is named but not shipped. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; &lt;sup id="fnref3"&gt;3&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialization explains the choices.&lt;/strong&gt; Meta AI reaches three billion DAUs, and inference economics at that scale reward the token per query being low, not the leaderboard rank being high. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The license is a symptom of the strategy.&lt;/strong&gt; If efficiency plus distribution plus patience is the bet, releasing the weights gives the bet away.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I've been writing about &lt;a href="https://monkfrom.earth/blogs/good-products-hard-to-vary" rel="noopener noreferrer"&gt;how constraints shape design, not features&lt;/a&gt; for a while, and Muse Spark is a useful instance of the pattern. The interesting move in AI this year might not be the model that scores higher. It might be the model that answers in fewer tokens and ships inside an application a billion people already open every day.&lt;/p&gt;

&lt;p&gt;I break things like this down on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. Usually shorter, sometimes as carousels. If this read resonated, you'd probably like those.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://ai.meta.com/blog/introducing-muse-spark-msl/" rel="noopener noreferrer"&gt;Meta AI, "Introducing Muse Spark"&lt;/a&gt;, April 8, 2026 ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;&lt;a href="https://simonwillison.net/2026/Apr/8/muse-spark/" rel="noopener noreferrer"&gt;Simon Willison, "Meta's new model is Muse Spark"&lt;/a&gt;, April 8, 2026 ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;&lt;a href="https://x.com/alexandr_wang/status/2041909376508985381" rel="noopener noreferrer"&gt;Alexandr Wang on X, launch thread and API update&lt;/a&gt;, April 2026 ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;&lt;a href="https://www.datacamp.com/blog/muse-spark" rel="noopener noreferrer"&gt;Muse Spark: Features, Benchmarks, and How to Use It, DataCamp&lt;/a&gt;, April 2026 ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;&lt;a href="https://x.com/xf1280/status/2043730980264128673" rel="noopener noreferrer"&gt;Fei Xia and Deedy Das on Muse Spark capabilities (thread)&lt;/a&gt;, April 13, 2026 ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;Matt Ridley, &lt;em&gt;How Innovation Works: And Why It Flourishes in Freedom&lt;/em&gt; (HarperCollins, 2020) ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;&lt;a href="https://www.sec.gov/Archives/edgar/data/1018724/000119312513151836/d511111dex991.htm" rel="noopener noreferrer"&gt;Jeff Bezos, 1997 Letter to Shareholders&lt;/a&gt;, Amazon ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
    <item>
      <title>GPT-5.4-Cyber explained: OpenAI's cyber-only AI</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Wed, 15 Apr 2026 08:33:55 +0000</pubDate>
      <link>https://dev.to/monkfromearth/gpt-54-cyber-explained-openais-cyber-only-ai-1nhn</link>
      <guid>https://dev.to/monkfromearth/gpt-54-cyber-explained-openais-cyber-only-ai-1nhn</guid>
      <description>&lt;p&gt;Two days ago I wrote about &lt;a href="https://monkfrom.earth/blogs/claude-mythos-autonomous-cyberattack" rel="noopener noreferrer"&gt;Claude Mythos completing AISI's 32-step cyberattack chain end-to-end&lt;/a&gt;. On April 14, OpenAI put out the clearest signal yet that the labs are reading the same capability curve and building the defender track in advance.&lt;/p&gt;

&lt;p&gt;They announced &lt;strong&gt;GPT-5.4-Cyber&lt;/strong&gt;, a version of GPT-5.4 fine-tuned to be "cyber-permissive," and scaled up their &lt;strong&gt;Trusted Access for Cyber (TAC)&lt;/strong&gt; program to thousands of verified individual defenders and hundreds of teams defending critical software.&lt;sup id="fnref1"&gt;1&lt;/sup&gt; In their own words, this is shipping "in preparation for increasingly more capable models over the next few months."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR.&lt;/strong&gt; This is defender tooling shipped before the next capability jump, not after. The model is the headline. The real story is a fine-tuned permissive variant named, tiered, and published as a product.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1p49fy2jdmp8yhcz84a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1p49fy2jdmp8yhcz84a.png" alt="OpenAI's April 14, 2026 announcement: Trusted access for the next era of cyber defense" width="800" height="352"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Primary source: &lt;a href="https://openai.com/index/scaling-trusted-access-for-cyber-defense/" rel="noopener noreferrer"&gt;OpenAI on scaling trusted access for cyber defense&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does GPT-5.4-Cyber Actually Unlock for Defenders?
&lt;/h2&gt;

&lt;p&gt;Same base model as GPT-5.4, different refusal boundary. OpenAI's description: a model that "lowers the refusal boundary for legitimate cybersecurity work" and adds capabilities like binary reverse engineering. It can analyze compiled software for malware, vulnerabilities, and robustness without access to source code.&lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Binary reverse engineering is the concrete unlock, and it is not small. It is one of the highest-leverage things a defender can automate, and it is exactly the kind of request that trips every refusal classifier ever built. The same prompt from a malicious actor yields the same output. The model cannot tell them apart. The verification layer can.&lt;/p&gt;

&lt;p&gt;Everything else in the envelope is less dramatic but more useful at scale. Vulnerability research without the hedging. Security education that answers the question instead of warning about it. Defensive programming help that does not refuse to describe the attack it is trying to prevent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Was Refusal Always a Bad Safeguard?
&lt;/h2&gt;

&lt;p&gt;For three years, the default safety move has been to push risk into the model through refusal training. It was the cheapest thing to ship and the easiest thing to measure. It also quietly assumed attackers and defenders use the same tool, so making the tool worse would hurt both evenly.&lt;/p&gt;

&lt;p&gt;That assumption was always wrong. Attackers run local models, jailbroken models, and purpose-built tooling. Refusals mostly tax the defenders trying to follow the rules.&lt;/p&gt;

&lt;p&gt;GPT-5.4 (classified "high" cyber capability under OpenAI's Preparedness Framework) keeps its refusal boundary for the public. The permissive variant ships only to people who have agreed to be identified. This is closer to how physical-world dual-use actually works. Pharmacies stock dangerous drugs behind an identity check, not behind a refusal. Labs buy restricted reagents with a license. The safeguard is not the molecule. It is the paperwork.&lt;/p&gt;

&lt;h2&gt;
  
  
  GPT-5.4-Cyber and the Mythos Parallel
&lt;/h2&gt;

&lt;p&gt;My last three posts on Claude Mythos describe the same shape from different angles. &lt;a href="https://monkfrom.earth/blogs/claude-mythos-system-card" rel="noopener noreferrer"&gt;The system card&lt;/a&gt; showed a model with enough situational awareness to conceal its own actions. &lt;a href="https://monkfrom.earth/blogs/anthropic-glasswing-ai-cybersecurity" rel="noopener noreferrer"&gt;Project Glasswing&lt;/a&gt; showed the same model finding thousands of zero-days in critical open-source infrastructure. The &lt;a href="https://monkfrom.earth/blogs/claude-mythos-autonomous-cyberattack" rel="noopener noreferrer"&gt;AISI cyber range&lt;/a&gt; showed it running a full 32-step autonomous cyberattack. Mythos itself is gated. Anthropic ships it only through its own trust program.&lt;/p&gt;

&lt;p&gt;So both frontier labs already operate the same model: dual-use capability behind verified access. What is new with GPT-5.4-Cyber is that OpenAI is the first to take the defender side of that model and publish it as a product tier: a named, fine-tuned, cyber-permissive variant with its own enrollment path and its own preparedness designation. Anthropic's gating is a policy. OpenAI's is a SKU.&lt;/p&gt;

&lt;p&gt;You can see the same bet in the numbers they quietly dropped in the same post. Codex Security has contributed to &lt;strong&gt;over 3,000 critical and high vulnerability fixes&lt;/strong&gt; since launch. Codex for Open Source has reached &lt;strong&gt;more than 1,000 open source projects&lt;/strong&gt;. The &lt;strong&gt;$10M Cybersecurity Grant Program&lt;/strong&gt; keeps funding defender tooling.&lt;sup id="fnref1"&gt;1&lt;/sup&gt; In the Mythos cyberattack post I wrote: &lt;em&gt;"I'd bet on it eventually, but 'eventually' and 'right now' are different things in security."&lt;/em&gt; This is a lab betting "right now," on the defender side, and betting it visibly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Verifies the Verifier?
&lt;/h2&gt;

&lt;p&gt;The uncomfortable follow-up to any identity-gated safeguard. OpenAI is now the identity layer for a meaningful slice of the security industry. Every defender applying for the permissive tier is trusting one company's KYC pipeline to decide who counts as a defender, and trusting OpenAI's interpretation of "legitimate use" to hold up over time.&lt;/p&gt;

&lt;p&gt;This is the part of the announcement I would most want to see discussed over the next few weeks. It is also the part nobody will discuss, because the new model is shinier than the policy question behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.4-Cyber&lt;/strong&gt; is a fine-tuned GPT-5.4 with fewer capability restrictions, shipped only to vetted defenders under the Trusted Access for Cyber program.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preemptive, not reactive.&lt;/strong&gt; OpenAI is shipping this ahead of more capable base models coming in the next few months, in their own words.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Both labs already gate dual-use.&lt;/strong&gt; Mythos is restricted through Anthropic's trust program. What is new is OpenAI naming a fine-tuned permissive variant as a product tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open question:&lt;/strong&gt; who audits the identity layer when OpenAI and Anthropic become the KYC gate for a chunk of the security industry?&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;I break down AI safety and capability stories on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. If this resonated, you would probably like those too.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://openai.com/index/scaling-trusted-access-for-cyber-defense/" rel="noopener noreferrer"&gt;OpenAI on scaling trusted access for cyber defense (April 14, 2026)&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Claude Mythos Is the First AI to Complete a Full Corporate Cyberattack End-to-End</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Mon, 13 Apr 2026 17:35:25 +0000</pubDate>
      <link>https://dev.to/monkfromearth/claude-mythos-is-the-first-ai-to-complete-a-full-corporate-cyberattack-end-to-end-3mk5</link>
      <guid>https://dev.to/monkfromearth/claude-mythos-is-the-first-ai-to-complete-a-full-corporate-cyberattack-end-to-end-3mk5</guid>
      <description>&lt;p&gt;The UK's AI Security Institute confirmed this week that Claude Mythos, an Anthropic model, became the first AI to complete their cyber range end-to-end.&lt;sup id="fnref1"&gt;1&lt;/sup&gt; The range is a &lt;strong&gt;32-step corporate network attack&lt;/strong&gt; scenario. Human experts estimate the same attack would take them &lt;strong&gt;20 hours&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The institute's recommendation to organizations: keep your software updated. Use access controls. Enable logging.&lt;/p&gt;

&lt;p&gt;The gap between those two sentences is the part of this story I keep returning to.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Claude Mythos ran a full autonomous cyberattack, 32 steps, end-to-end, in a scenario that takes human experts 20 hours. It is the first AI to complete AISI's cyber range. The official response was to recommend basic security hygiene. The mismatch between the capability and the response is where the real story lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Did AI Go From Basic Cyber Tasks to a Full Autonomous Cyberattack?
&lt;/h2&gt;

&lt;p&gt;Self-driving cars give me the cleanest parallel here.&lt;/p&gt;

&lt;p&gt;For a decade, every individual piece of the self-driving puzzle existed as a demo. Lane-keeping worked. Adaptive cruise worked. Automated parking worked. What didn't exist, for years, was the full ride. Door to door, no human touching the wheel. When Waymo's first commercial robotaxi picked up a passenger in 2020, what changed wasn't the individual capabilities. It was the threshold: chaining all of them into one uninterrupted ride.&lt;/p&gt;

&lt;p&gt;The same thing just happened in offensive cybersecurity.&lt;/p&gt;

&lt;p&gt;Each step of a network attack has been within reach of AI models for a while. Reconnaissance. Crafting payloads. Pivoting through a subnet. Covering tracks. What didn't exist was a model that could chain all 32 of those steps together without a human stepping in between. Claude Mythos did.&lt;/p&gt;

&lt;p&gt;In 2023, leading AI models struggled with basic cybersecurity tasks. Not sophisticated ones. Basic ones. Three years later, one of them drove the entire route.&lt;/p&gt;

&lt;p&gt;AISI published the actual curve, and it is worth looking at directly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgraaer1ehytkwzcembp.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcgraaer1ehytkwzcembp.jpeg" alt="AISI evaluation showing average steps completed on 'The Last Ones' cyber range per spent tokens. Claude Mythos Preview reaches around 22 steps on average and a maximum of roughly 32, clearly above Claude Opus 4.6, GPT-5.4, GPT-5.3 Codex, Claude Opus 4.5, Claude Sonnet 4.5, and GPT-4o." width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The red line is Mythos. GPT-4o sits near the bottom, completing around three steps before running out. Sonnet 4.5 gets to roughly 11. Opus 4.5 and the GPT-5 family cluster in the mid teens. Opus 4.6 pushes past 16. Mythos is the only line that clears the middle milestones: C2 reverse engineering, advanced persistence, infrastructure compromise, and eventually M9 — "Full network takeover."&lt;sup id="fnref1"&gt;1&lt;/sup&gt; The shape of that curve is what "first AI to complete the range end-to-end" actually looks like.&lt;/p&gt;

&lt;p&gt;AISI is careful about the current scope. The capability applies to "small, weakly defended, and vulnerable systems" given network access. Think of it as the robotaxi that only works on mapped, sunny, well-marked urban grids. Hardened enterprise infrastructure with proper controls is still a different problem, the same way a snowy mountain pass is still a different problem for Waymo.&lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;The trajectory is what matters. 2023 to 2026 is three years.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Does an Autonomous Cyberattack Change the Security Equation?
&lt;/h2&gt;

&lt;p&gt;The asymmetry in security has always been simple: attackers need to find one gap, defenders need to close every door.&lt;/p&gt;

&lt;p&gt;AI doesn't change that asymmetry. It changes the cost of running an attack. An automated system doesn't need domain expertise to chain 32 steps. It doesn't get tired halfway through. It doesn't hesitate at unfamiliar territory.&lt;/p&gt;

&lt;p&gt;What previously required a skilled adversary with deep knowledge, time, and custom tools now requires API access and a goal.&lt;/p&gt;

&lt;p&gt;The same model AISI tested on offense has been used defensively in &lt;a href="https://dev.to/blogs/anthropic-glasswing-ai-cybersecurity"&gt;Anthropic's Project Glasswing&lt;/a&gt; to find thousands of zero-days in critical open-source infrastructure. Offense and defense, same capability, same model. The dual-use nature isn't incidental. It's structural. Whoever has the model has both sides.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Should Organizations Do After Claude Mythos Ran a Full Cyberattack?
&lt;/h2&gt;

&lt;p&gt;Patch your systems. Use MFA. Enable logging. AISI's recommendations are correct.&lt;/p&gt;

&lt;p&gt;But they were correct before this evaluation too. That's the part I can't get past.&lt;/p&gt;

&lt;p&gt;These recommendations address the baseline: opportunistic attackers, misconfigured systems, low-skill adversaries. They don't address the shift in assumption that happens when a fully autonomous cyberattack chain becomes possible. Hygiene is still necessary. It is no longer sufficient as a strategy.&lt;/p&gt;

&lt;p&gt;AISI published a joint piece with the UK's National Cyber Security Centre on preparing defenders for frontier AI systems.&lt;sup id="fnref1"&gt;1&lt;/sup&gt; That collaboration exists because the people closest to this problem know the defensive tooling gap is real. The open question is whether the defensive side of AI moves as fast as the offensive side. I'd bet on it eventually, but "eventually" and "right now" are different things in security.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Does the Claude Mythos Evaluation Pattern Reveal?
&lt;/h2&gt;

&lt;p&gt;This is the third notable evaluation result for Claude Mythos in April alone. &lt;a href="https://dev.to/blogs/claude-mythos-system-card"&gt;The system card&lt;/a&gt; showed a model with enough situational awareness to conceal its own actions. Project Glasswing showed it finding thousands of vulnerabilities in critical infrastructure. The AISI cyber range shows it running a full autonomous cyberattack.&lt;/p&gt;

&lt;p&gt;These aren't contradictions. They are the same underlying capability applied in different contexts. A model capable enough for complex multi-step reasoning is capable enough to create real problems at scale.&lt;/p&gt;

&lt;p&gt;The value of these evaluations is that they name what's happening before it becomes a crisis, even when the recommendations that follow don't match the scale of what was just described. Naming it first is not nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Mythos&lt;/strong&gt; became the first AI to complete a &lt;strong&gt;32-step corporate cyberattack&lt;/strong&gt; chain end-to-end in AISI's cyber range&lt;/li&gt;
&lt;li&gt;Human experts estimate the same operation takes &lt;strong&gt;20 hours&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;In 2023, leading models couldn't complete basic cybersecurity tasks. Three years later, one completed a full autonomous cyberattack&lt;/li&gt;
&lt;li&gt;Current capability is scoped to "small, weakly defended" systems, not enterprise infrastructure with proper controls&lt;/li&gt;
&lt;li&gt;The trajectory matters more than the current benchmark: three years of rapid progress, with no signs of slowing&lt;/li&gt;
&lt;li&gt;AISI's defensive recommendations (patch, use MFA, enable logging) are correct but baseline — they predate this evaluation&lt;/li&gt;
&lt;li&gt;AISI and the UK NCSC published joint guidance on preparing defenders for frontier AI systems&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;I break down things like this on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt; — usually shorter, sometimes as carousels. If this resonated, you'd probably like those too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://x.com/AISecurityInst/status/2043683577594794183" rel="noopener noreferrer"&gt;AI Security Institute (@AISecurityInst) — Claude Mythos cyber range evaluation, April 13, 2026&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>security</category>
      <category>ai</category>
      <category>architecture</category>
      <category>news</category>
    </item>
    <item>
      <title>Zuckerberg Is Writing Code Again. With Claude Code.</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Sun, 05 Apr 2026 10:24:18 +0000</pubDate>
      <link>https://dev.to/monkfromearth/zuckerberg-is-writing-code-again-with-claude-code-26b1</link>
      <guid>https://dev.to/monkfromearth/zuckerberg-is-writing-code-again-with-claude-code-26b1</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Mark Zuckerberg shipped 3 diffs to Meta's monorepo last month, his first code in 20 years. He's a heavy user of Claude Code CLI. One of his diffs got 200+ approvals from engineers who wanted to say they reviewed the CEO's code. He's not the only one. Garry Tan at Y Combinator is doing the same thing. The pattern is clear: AI coding tools are pulling founders back into the codebase.&lt;/p&gt;




&lt;h2&gt;
  
  
  What happened?
&lt;/h2&gt;

&lt;p&gt;Gergely Orosz at The Pragmatic Engineer &lt;a href="https://newsletter.pragmaticengineer.com/p/the-pulse-industry-leaders-return" rel="noopener noreferrer"&gt;reported this week&lt;/a&gt; that Mark Zuckerberg is back to writing code. Three diffs landed in Meta's monorepo in March 2026. His tool of choice: &lt;strong&gt;Claude Code CLI&lt;/strong&gt;, Anthropic's terminal-based AI coding assistant. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;To put the scale in perspective: Meta's monorepo now has &lt;strong&gt;close to 100 million diffs&lt;/strong&gt;. Back in 2006, the entire Facebook codebase had fewer than 10,000. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Zuckerberg's last meaningful code contributions were in 2006. That's a 20-year gap. The fact that he's back, and using an AI tool to do it, says something about where we are.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 2010 diff that got force-merged
&lt;/h2&gt;

&lt;p&gt;This isn't Zuckerberg's first time making waves in code review.&lt;/p&gt;

&lt;p&gt;In 2010, he submitted a diff that made profile photos clickable on the profile page. Michael Novati, a senior engineer who would become the first person to hold Meta's L7 "coding machine" archetype, &lt;a href="https://newsletter.pragmaticengineer.com/p/the-coding-machine-at-meta" rel="noopener noreferrer"&gt;blocked it&lt;/a&gt;. The reason: formatting issues everywhere. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; &lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Zuckerberg overrode the block and force-merged it. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Novati spent eight years at Meta and was recognized as the top code committer company-wide for several of them. The Pragmatic Engineer did &lt;a href="https://newsletter.pragmaticengineer.com/p/the-coding-machine-at-meta" rel="noopener noreferrer"&gt;a full episode&lt;/a&gt; with him about what it means to be a "coding machine" at that scale. &lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;The 2010 story is funny in hindsight. But the 2026 version is different. This time, Zuckerberg isn't force-merging past reviewers. He's using AI to write code that engineers actually want to approve. &lt;strong&gt;One of his March diffs got more than 200 approvals&lt;/strong&gt;, with devs jumping at the chance to say they'd reviewed the CEO's work. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters beyond the anecdote
&lt;/h2&gt;

&lt;p&gt;Three diffs from the CEO of a 70,000-employee company is a footnote in a 100-million-diff monorepo. The signal isn't the code. It's the behavior.&lt;/p&gt;

&lt;p&gt;Zuckerberg isn't the only founder pulled back into the codebase by AI tools. Garry Tan, CEO of Y Combinator, &lt;a href="https://github.com/garrytan/gstack" rel="noopener noreferrer"&gt;returned to coding&lt;/a&gt; after 15 years and open-sourced gstack, a Claude Code system with 23 specialist tools that turns the CLI into a virtual engineering team: code reviewer, QA lead, security auditor, release engineer. &lt;sup id="fnref3"&gt;3&lt;/sup&gt; &lt;sup id="fnref4"&gt;4&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Tobias Lütke, CEO of Shopify, has been running experiments with &lt;a href="https://dev.to/blogs/karpathy-autoresearch-explained-ml-to-marketing"&gt;Karpathy's AutoResearch&lt;/a&gt; on internal company data. 37 experiments overnight. 19% performance gain.&lt;/p&gt;

&lt;p&gt;I wrote about &lt;a href="https://dev.to/blogs/karpathy-autoresearch-explained-ml-to-marketing"&gt;how AutoResearch works&lt;/a&gt; a few days ago. The throughline is the same: AI tools are collapsing the gap between "person with ideas" and "person who ships code." Founders used to be the first type. AI is turning them back into the second.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meta's bet: AI writes most of the code
&lt;/h2&gt;

&lt;p&gt;Zuckerberg coding again isn't a hobby. It's a signal of where Meta is heading.&lt;/p&gt;

&lt;p&gt;Leaked internal documents from March 2026 show aggressive targets. Meta's creation org wants &lt;strong&gt;65% of engineers writing 75% or more of their committed code using AI&lt;/strong&gt; by mid-2026. The Scalable Machine Learning org set a target of 50-80% AI-assisted code. &lt;sup id="fnref5"&gt;5&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Zuckerberg himself said on Dwarkesh Patel's podcast that "in the next year, maybe half the development will be done by AI as opposed to people, and that will kind of increase from there." &lt;sup id="fnref6"&gt;6&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;He's not predicting this from the sidelines. He's using Claude Code in the terminal to ship diffs to his own monorepo. The CEO is the pilot customer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern worth watching
&lt;/h2&gt;

&lt;p&gt;There's a recurring shape here.&lt;/p&gt;

&lt;p&gt;Karpathy builds AutoResearch. Constrains the agent to one file, one metric, one 5-minute cycle. The constraint is the invention. Lütke runs it on Shopify data overnight. Marketers adapt it for landing pages.&lt;/p&gt;

&lt;p&gt;Anthropic builds Claude Code. Tan wraps it in 23 specialist agents. Zuckerberg uses it to ship his first code in 20 years.&lt;/p&gt;

&lt;p&gt;The tools don't just help engineers code faster. They re-open coding to people who stopped. Founders who moved into strategy, management, fundraising. People who haven't touched a codebase in a decade. The barrier to re-entry used to be months of catching up on tooling, frameworks, and conventions. Now it's a terminal and a prompt.&lt;/p&gt;

&lt;p&gt;That's a different kind of disruption than "AI replaces developers." It's closer to: AI brings back the builder-CEO. The person who can see a problem, describe a solution, and ship it before the meeting ends.&lt;/p&gt;

&lt;p&gt;Whether Zuckerberg's 3 diffs were good code is beside the point. The 200 engineers who approved them probably weren't reviewing for correctness. But the fact that a CEO can sit down with Claude Code and produce something that compiles, passes CI, and lands in a 100-million-diff monorepo? That's the new baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zuckerberg shipped 3 diffs&lt;/strong&gt; to Meta's monorepo in March 2026, his first code in ~20 years, using Claude Code CLI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One diff got 200+ approvals&lt;/strong&gt; from engineers eager to review the CEO's code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Garry Tan&lt;/strong&gt; (Y Combinator) also returned to coding after 15 years, open-sourcing gstack for Claude Code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meta targets 65-75% AI-assisted code&lt;/strong&gt; across engineering by mid-2026&lt;/li&gt;
&lt;li&gt;AI coding tools are pulling &lt;strong&gt;founders back into codebases&lt;/strong&gt; they left years ago&lt;/li&gt;
&lt;li&gt;The disruption isn't "AI replaces developers," it's &lt;strong&gt;"AI re-opens development"&lt;/strong&gt; to people who stopped&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;I break down things like this on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. If this resonated, you'd probably like those too.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://newsletter.pragmaticengineer.com/p/the-pulse-industry-leaders-return" rel="noopener noreferrer"&gt;The Pulse: Industry leaders return to coding with AI — The Pragmatic Engineer&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;&lt;a href="https://newsletter.pragmaticengineer.com/p/the-coding-machine-at-meta" rel="noopener noreferrer"&gt;"The Coding Machine" at Meta with Michael Novati — The Pragmatic Engineer&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;&lt;a href="https://github.com/garrytan/gstack" rel="noopener noreferrer"&gt;gstack — Garry Tan's Claude Code setup (GitHub)&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;&lt;a href="https://techcrunch.com/2026/03/17/why-garry-tans-claude-code-setup-has-gotten-so-much-love-and-hate/" rel="noopener noreferrer"&gt;Why Garry Tan's Claude Code setup has gotten so much love, and hate — TechCrunch&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;&lt;a href="https://www.theweek.in/news/sci-tech/2026/03/27/how-aggressive-is-mark-zuckerberg-s-ai-native-push-for-meta-leaked-documents-offer-new-details-on-coding-targets.html" rel="noopener noreferrer"&gt;How aggressive is Mark Zuckerberg's 'AI-native' push for Meta? — The Week&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;&lt;a href="https://www.dwarkesh.com/p/mark-zuckerberg-2" rel="noopener noreferrer"&gt;Mark Zuckerberg — AI will write most Meta code in 18 months — Dwarkesh Patel&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>career</category>
    </item>
    <item>
      <title>What OpenAI's $122 Billion Round Tells Us About AI's New Shape</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Sun, 05 Apr 2026 10:23:07 +0000</pubDate>
      <link>https://dev.to/monkfromearth/what-openais-122-billion-round-tells-us-about-ais-new-shape-58a7</link>
      <guid>https://dev.to/monkfromearth/what-openais-122-billion-round-tells-us-about-ais-new-shape-58a7</guid>
      <description>&lt;p&gt;On March 31, 2026, &lt;a href="https://openai.com" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; closed a &lt;strong&gt;$122 billion&lt;/strong&gt; round at an &lt;strong&gt;$852 billion&lt;/strong&gt; valuation. Amazon put in $50 billion. Nvidia and SoftBank put in $30 billion each. Three billion came from retail investors. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; &lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;That single round is larger than every venture dollar raised across India's startup ecosystem in FY26 combined, which totalled $10.1 billion. &lt;sup id="fnref3"&gt;3&lt;/sup&gt; Two ecosystems, two different jobs being funded. More on that later.&lt;/p&gt;

&lt;p&gt;The reflex when you see numbers like $122B is to call it a bubble. I don't think it is. Look at what OpenAI has been doing with the capital and the check starts to make sense. Not because OpenAI will definitely win. Because nobody else is attempting what OpenAI is attempting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What OpenAI Is Actually Doing
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;The shape of a category being drawn&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the past six weeks, OpenAI has moved at every economic layer where AI touches the world.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Media.&lt;/strong&gt; Acquired &lt;a href="https://tbpn.com" rel="noopener noreferrer"&gt;TBPN&lt;/a&gt;, a daily three-hour founder-focused tech show hosted by John Coogan and Jordi Hays, for a reported low hundreds of millions. TBPN did $5M in ad revenue in 2025 and is on track for $30M in 2026. &lt;sup id="fnref4"&gt;4&lt;/sup&gt; OpenAI now owns three hours a day of the tech audience's attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer commerce.&lt;/strong&gt; ChatGPT Agent shipped with &lt;a href="https://walmart.com" rel="noopener noreferrer"&gt;Walmart&lt;/a&gt; integration for agentic shopping. Users browse, compare, and buy inside ChatGPT. &lt;sup id="fnref5"&gt;5&lt;/sup&gt; First agentic commerce deployment at national retail scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise data and delivery.&lt;/strong&gt; &lt;a href="https://snowflake.com" rel="noopener noreferrer"&gt;Snowflake&lt;/a&gt; signed a $200 million multi-year partnership putting OpenAI's models directly inside enterprise data warehouses. &lt;sup id="fnref6"&gt;6&lt;/sup&gt; &lt;a href="https://accenture.com" rel="noopener noreferrer"&gt;Accenture&lt;/a&gt; is handling enterprise implementation and delivery. &lt;sup id="fnref7"&gt;7&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developer surface.&lt;/strong&gt; Codex now ships as a plugin inside Claude Code, &lt;a href="https://anthropic.com" rel="noopener noreferrer"&gt;Anthropic's&lt;/a&gt; coding agent. &lt;sup id="fnref8"&gt;8&lt;/sup&gt; OpenAI's model, running on their competitor's surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure.&lt;/strong&gt; The Stargate project is building $500 billion of compute capacity across seven sites and ~7 GW of planned capacity. &lt;sup id="fnref9"&gt;9&lt;/sup&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No other AI-native company is operating across all five layers. Anthropic stays deep and narrow on models plus Claude Code. &lt;a href="https://google.com" rel="noopener noreferrer"&gt;Google&lt;/a&gt; is retrofitting Gemini into an existing conglomerate. &lt;a href="https://x.ai" rel="noopener noreferrer"&gt;xAI&lt;/a&gt; has one distribution surface, which is X. Chinese players face different constraints and a different market. Microsoft is already a conglomerate, and owns 27% of OpenAI anyway. &lt;sup id="fnref10"&gt;10&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;OpenAI is alone in attempting the breadth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Edison Pattern
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Why building the surround is the innovation&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Matt Ridley makes a quiet argument in &lt;em&gt;How Innovation Works&lt;/em&gt; that's worth sitting with. The light bulb, he writes, was invented &lt;strong&gt;at least 23 times&lt;/strong&gt; before Edison. Joseph Swan had a working version. So did Heinrich Göbel, Hiram Maxim, Alexander Lodygin, and roughly twenty others. &lt;sup id="fnref11"&gt;11&lt;/sup&gt; Edison's genius wasn't the filament. It was understanding that a bulb is useless on its own.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"He was the first to bring everything together, to combine it with a system of generating and distributing electricity."&lt;/em&gt;&lt;br&gt;
— Matt Ridley, on Edison&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So Edison built the surround. Generators, copper distribution, meters, fuses, junction boxes, domestic wiring standards. He opened Pearl Street Station in 1882 as the first commercial central power plant because without it, the bulb could not be sold. He didn't invent electricity any more than he invented the bulb. He built the &lt;strong&gt;economy&lt;/strong&gt; that made both useful.&lt;/p&gt;

&lt;p&gt;Ridley's larger claim is that innovation is almost always incremental and collective, not heroic. What looks like one person's breakthrough is usually a decades-long relay. The genius lies in &lt;strong&gt;assembly&lt;/strong&gt;, in drawing together the necessary surrounding pieces so the core idea can actually be used.&lt;/p&gt;

&lt;p&gt;Read OpenAI's $122B through that lens. The frontier model isn't the innovation. Anthropic has one. Google has one. DeepSeek has one. Several companies are, as Ridley would say, thinking simultaneously about similar solutions. What OpenAI is building is the surround. Media, commerce, enterprise data, developer surfaces, compute infrastructure. The things that make the model &lt;em&gt;usable as an economy&lt;/em&gt;, not just as a tool.&lt;/p&gt;

&lt;p&gt;Whether they're drawing the right surround is the open question. That they're drawing it at all is what separates them from everyone else.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Gets Cut
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Reading direction by what someone walks away from&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A large check is easy to turn into sprawl. What keeps this from being sprawl is visible in what OpenAI has walked away from in the same six-week window.&lt;/p&gt;

&lt;p&gt;The Sora consumer video app is shutting down April 26. &lt;sup id="fnref12"&gt;12&lt;/sup&gt; The &lt;a href="https://thewaltdisneycompany.com" rel="noopener noreferrer"&gt;Disney&lt;/a&gt; licensing deal, which included a $1 billion equity investment, never closed. &lt;sup id="fnref13"&gt;13&lt;/sup&gt; Sora's user count had collapsed from 1 million to under 500,000, and the app was burning roughly $1 million a day. &lt;sup id="fnref14"&gt;14&lt;/sup&gt; OpenAI walked from a live $1B check.&lt;/p&gt;

&lt;p&gt;The Stargate Abilene expansion, 600 MW of additional capacity, was cancelled in March. &lt;a href="https://oracle.com" rel="noopener noreferrer"&gt;Oracle&lt;/a&gt; publicly cited OpenAI's "often-changing demand forecasting" as the reason negotiations collapsed. &lt;sup id="fnref15"&gt;15&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;I wrote in an earlier post about how &lt;a href="https://dev.to/blogs/good-products-hard-to-vary"&gt;good products are hard to vary&lt;/a&gt;. Every element load-bearing, nothing extra. That principle has a corporate version. A good strategy, at this scale, is also hard to vary. Every layer of breadth has to earn its place. Sora didn't. The Abilene expansion couldn't. Whether the remaining layers will is the bet.&lt;/p&gt;

&lt;p&gt;You can read a lot about what someone believes by what they refuse to keep paying for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Bets, Same Wave
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;What India's $10B is actually funding&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Back to the opening comparison. India's $10.1B across FY26 and OpenAI's $122B in a single round are not the same job, and the contrast is interesting for that reason, not because one is bigger.&lt;/p&gt;

&lt;p&gt;OpenAI's capital funds &lt;strong&gt;platform creation&lt;/strong&gt;. It flows toward compute, model capability, enterprise partnerships, distribution surfaces, and acquisitions that lock in attention.&lt;/p&gt;

&lt;p&gt;India's capital funds &lt;strong&gt;founders building on top of platforms&lt;/strong&gt;. Early-stage funding jumped 58% year-over-year in Q1 2026, while $100M+ deals hit zero for the first time since 2022. &lt;sup id="fnref16"&gt;16&lt;/sup&gt; The capital is intentionally horizontal: thousands of bets on use cases that assume a platform already exists.&lt;/p&gt;

&lt;p&gt;Both bets are rational. They sit at different layers of the same wave. One is Edison at Pearl Street. The other is the thousands of businesses that came alive the day the grid turned on: factories, streetcars, radios, refrigerators, telegrams. Neither layer makes sense without the other.&lt;/p&gt;

&lt;h2&gt;
  
  
  What To Watch
&lt;/h2&gt;

&lt;p&gt;Watch OpenAI not to see who wins AI, but to see what the new category actually looks like. $122 billion is the price of drawing that shape in real time. OpenAI happens to be holding the pencil.&lt;/p&gt;

&lt;p&gt;Whether this bet works will take three to five years to know. Meanwhile, the shape itself is the interesting thing. An AI-native attempt at breadth, at sovereign-fund scale, before the category even has settled edges.&lt;/p&gt;

&lt;p&gt;Nobody has tried this before in AI. That's the news.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI raised $122 billion in one round at an $852 billion valuation, more than India's entire startup ecosystem raised in a year&lt;/li&gt;
&lt;li&gt;The capital services a category-creation bet, not a product bet&lt;/li&gt;
&lt;li&gt;Direction shows up in the cuts: Sora killed, $1B Disney investment walked away from, Stargate Abilene expansion cancelled&lt;/li&gt;
&lt;li&gt;OpenAI is the only AI-native company attempting breadth across media, consumer, enterprise, developer, and infrastructure simultaneously&lt;/li&gt;
&lt;li&gt;Anthropic stays narrow, Google retrofits, xAI has one surface, Chinese players are constrained by market and chip access&lt;/li&gt;
&lt;li&gt;India's $10.1B funds founders building on platforms. OpenAI's $122B funds being the platform. Different jobs, both real.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;I break down things like this on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. If this resonated, you'd probably like those too.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;OpenAI, &lt;a href="https://openai.com/index/accelerating-the-next-phase-ai/" rel="noopener noreferrer"&gt;"Accelerating the next phase of AI"&lt;/a&gt; (March 31, 2026). ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;Bloomberg, &lt;a href="https://www.bloomberg.com/news/articles/2026-03-31/openai-valued-at-852-billion-after-completing-122-billion-round" rel="noopener noreferrer"&gt;"OpenAI Valued at $852 Billion After Completing $122 Billion Round"&lt;/a&gt; (March 31, 2026). Amazon $50B ($35B contingent on IPO/AGI), Nvidia $30B, SoftBank $30B. Retail investors $3B via TechCrunch. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;Economic Times via LinkedIn News, FY26 India startup funding totals $10.1 billion, down 9% YoY. Moneycontrol/Bain-IVCA reported VC fundraising rebounded to ~$5.4 billion in 2025. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;TechCrunch, &lt;a href="https://techcrunch.com/2026/04/02/openai-acquires-tbpn-the-buzzy-founder-led-business-talk-show/" rel="noopener noreferrer"&gt;"OpenAI acquires TBPN"&lt;/a&gt; (April 2, 2026). TBPN sits within OpenAI's Strategy org under Chris Lehane. Editorial independence preserved. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;Digital Commerce 360, &lt;a href="https://www.digitalcommerce360.com/2026/03/24/openai-agentic-commerce-updates-chatgpt-walmart/" rel="noopener noreferrer"&gt;"OpenAI reveals updates to its agentic commerce experience for ChatGPT"&lt;/a&gt; (March 24, 2026). ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;Snowflake, &lt;a href="https://www.snowflake.com/en/news/press-releases/snowflake-and-openAI-forge-200-million-partnership-to-bring-enterprise-ready-ai-to-the-worlds-most-trusted-data-platform/" rel="noopener noreferrer"&gt;"Snowflake and OpenAI Forge $200 Million Partnership"&lt;/a&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;OpenAI, &lt;a href="https://openai.com/index/accenture-partnership/" rel="noopener noreferrer"&gt;"Accenture and OpenAI accelerate enterprise AI success"&lt;/a&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn8"&gt;
&lt;p&gt;OpenAI Codex Plugin for Claude Code, &lt;a href="https://github.com/openai/codex-plugin-cc" rel="noopener noreferrer"&gt;github.com/openai/codex-plugin-cc&lt;/a&gt;. Commands include &lt;code&gt;/codex:review&lt;/code&gt;, &lt;code&gt;/codex:adversarial-review&lt;/code&gt;, &lt;code&gt;/codex:rescue&lt;/code&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn9"&gt;
&lt;p&gt;OpenAI, &lt;a href="https://openai.com/index/announcing-the-stargate-project/" rel="noopener noreferrer"&gt;"Announcing The Stargate Project"&lt;/a&gt;. $500 billion planned investment over four years. Nearly 7 GW across flagship Abilene site, five new sites, and CoreWeave partnerships. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn10"&gt;
&lt;p&gt;OpenAI, &lt;a href="https://openai.com/index/next-chapter-of-microsoft-openai-partnership/" rel="noopener noreferrer"&gt;"The next chapter of the Microsoft-OpenAI partnership"&lt;/a&gt;. Microsoft holds ~27% on as-converted diluted basis, ~$135 billion value post-recap. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn11"&gt;
&lt;p&gt;Matt Ridley, &lt;a href="https://en.wikipedia.org/wiki/How_Innovation_Works" rel="noopener noreferrer"&gt;&lt;em&gt;How Innovation Works and Why It Flourishes in Freedom&lt;/em&gt;&lt;/a&gt; (2020). Ridley draws on Robert Friedel, Paul Israel, and Bernard Finn's history of the incandescent bulb, which identifies at least 23 inventors who produced working versions before Edison. Ridley's argument: "Edison was the first to bring everything together, to combine it with a system of generating and distributing electricity." Pearl Street Station opened in Manhattan on September 4, 1882 as the world's first commercial central power plant. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn12"&gt;
&lt;p&gt;The Decoder, &lt;a href="https://the-decoder.com/openai-sets-two-stage-sora-shutdown-with-app-closing-april-2026-and-api-following-in-september/" rel="noopener noreferrer"&gt;"OpenAI sets two-stage Sora shutdown"&lt;/a&gt;. App discontinued April 26, 2026. API discontinued September 24, 2026. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn13"&gt;
&lt;p&gt;Variety, &lt;a href="https://variety.com/2026/digital/news/openai-shutting-down-sora-video-disney-1236698277/" rel="noopener noreferrer"&gt;"OpenAI Will Shut Down Sora Video App; Disney Drops Plans for $1 Billion Investment"&lt;/a&gt;. Original Disney-OpenAI Sora agreement (December 2025) included $1B equity investment plus warrants, 3-year licensing of 200+ Disney/Marvel/Pixar/Star Wars characters. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn14"&gt;
&lt;p&gt;TechCrunch, &lt;a href="https://techcrunch.com/2026/03/29/why-openai-really-shut-down-sora/" rel="noopener noreferrer"&gt;"Why OpenAI really shut down Sora"&lt;/a&gt;. User count peaked near 1 million, fell below 500,000. App burning roughly $1 million per day. Sora research team pivoting to world simulation for robotics. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn15"&gt;
&lt;p&gt;Noah Bean, &lt;a href="https://medium.com/@noahbean3396/stargates-first-crack-reveals-the-fault-lines-beneath-ai-s-trillion-dollar-buildout-1a3e5476b760" rel="noopener noreferrer"&gt;"Stargate's first crack reveals the fault lines"&lt;/a&gt; (March 2026). Oracle and OpenAI abandoned plans to expand Abilene from 1.2 GW to ~2.0 GW. Oracle cited financing terms and OpenAI's "often-changing demand forecasting". ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn16"&gt;
&lt;p&gt;Inc42, &lt;a href="https://inc42.com/reports/indian-tech-startup-funding-report-q1-2026/" rel="noopener noreferrer"&gt;"Indian Tech Startup Funding Report Q1 2026"&lt;/a&gt;. Q1 2026 funding: $2.3 billion (-26% YoY). Zero $100M+ deals, first time since 2022. Early-stage +58% YoY. 48% of investors call AI the most investment-ready sector, fewer than 10% willing to pay premium valuations. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>openapi</category>
      <category>discuss</category>
      <category>news</category>
    </item>
    <item>
      <title>Axios Supply Chain Attack: How North Korean Hackers Social-Engineered an Open Source Maintainer</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Fri, 03 Apr 2026 17:43:44 +0000</pubDate>
      <link>https://dev.to/monkfromearth/axios-supply-chain-attack-how-north-korean-hackers-social-engineered-an-open-source-maintainer-2ae9</link>
      <guid>https://dev.to/monkfromearth/axios-supply-chain-attack-how-north-korean-hackers-social-engineered-an-open-source-maintainer-2ae9</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; North Korean hackers built a fake company, complete with a Slack workspace, LinkedIn activity, and a full team of fake profiles, to trick the lead maintainer of axios into installing malware. One Teams meeting later, they had full control of his machine. They used that access to push malicious versions of a library with &lt;strong&gt;100 million weekly downloads&lt;/strong&gt;. The attack was live for 3 hours. It's the most sophisticated social engineering of an open source maintainer we've seen, and it exposes gaps in npm's security model that no amount of 2FA can fix.&lt;/p&gt;




&lt;p&gt;On March 31, 2026, two versions of axios that had never been through the project's CI pipeline appeared on npm. Versions 1.14.1 and 0.30.4 both carried a new dependency nobody had seen before: &lt;code&gt;plain-crypto-js&lt;/code&gt;. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Within six minutes, Socket's automated scanner flagged the package. &lt;sup id="fnref2"&gt;2&lt;/sup&gt; Within three hours, npm pulled both versions. But in those three hours, an unknown number of developers, CI pipelines, and production systems had already installed a cross-platform Remote Access Trojan.&lt;/p&gt;

&lt;p&gt;The story of &lt;em&gt;how&lt;/em&gt; those versions got published is more interesting than the malware itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you trick someone who maintains code for 100 million developers?
&lt;/h2&gt;

&lt;p&gt;Jason Saayman, the lead maintainer of axios, &lt;a href="https://github.com/axios/axios/issues/10636#issuecomment-4180237789" rel="noopener noreferrer"&gt;shared the playbook&lt;/a&gt; in the project's post-mortem. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; It reads less like a hacking story and more like a con movie.&lt;/p&gt;

&lt;p&gt;The attackers reached out masquerading as the founder of a real company. They had cloned the founder's identity and the company itself. Then came the invite to a Slack workspace.&lt;/p&gt;

&lt;p&gt;This wasn't a hastily thrown-together channel. The workspace was branded with the company's visual identity. It had channels where "team members" shared the company's LinkedIn posts (likely linking to the real company's account). There were fake profiles for the company's team &lt;em&gt;and&lt;/em&gt; for other open source maintainers, giving the whole setup social proof.&lt;/p&gt;

&lt;p&gt;After establishing trust through the Slack workspace, they scheduled an MS Teams meeting. Multiple people appeared to be on the call. During the meeting, something on Saayman's system was flagged as "out of date." He installed the update, thinking it was related to Teams.&lt;/p&gt;

&lt;p&gt;That update was the RAT.&lt;/p&gt;

&lt;p&gt;"Everything was extremely well co-ordinated, looked legit and was done in a professional manner," Saayman &lt;a href="https://github.com/axios/axios/issues/10636#issuecomment-4180237789" rel="noopener noreferrer"&gt;wrote&lt;/a&gt;. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Saayman wasn't the only target
&lt;/h2&gt;

&lt;p&gt;Weeks before the axios compromise, &lt;a href="https://github.com/axios/axios/issues/10636" rel="noopener noreferrer"&gt;voxpelli&lt;/a&gt;, a maintainer of packages like Mocha, described a nearly identical approach. &lt;sup id="fnref1"&gt;1&lt;/sup&gt; Someone invited him to be on a "podcast." A week of lead-up followed: social media images, preparatory interview questions, other guests in a group chat. Everything felt real.&lt;/p&gt;

&lt;p&gt;When it came time to "record," the fake streaming website claimed a connection issue and tried to get him to install a non-notarized macOS app. When he refused, they tried a &lt;code&gt;curl&lt;/code&gt; command to download and run something. When that failed too, they went dark and deleted every conversation.&lt;/p&gt;

&lt;p&gt;"It's creepy how they target you, no matter if they are real people or possibly AI," voxpelli wrote. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What the malware actually did
&lt;/h2&gt;

&lt;p&gt;The technical chain was clean. &lt;code&gt;plain-crypto-js@4.2.1&lt;/code&gt; used a &lt;code&gt;postinstall&lt;/code&gt; hook to run &lt;code&gt;setup.js&lt;/code&gt;, a 4,209-byte dropper obfuscated with reversed Base64 and XOR cipher (key: &lt;code&gt;OrDeR_7077&lt;/code&gt;). &lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;It deployed platform-specific payloads:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;macOS:&lt;/strong&gt; A C++ binary disguised as an Apple system daemon at &lt;code&gt;/Library/Caches/com.apple.act.mond&lt;/code&gt;, supporting remote code execution and process injection &lt;sup id="fnref2"&gt;2&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows:&lt;/strong&gt; Renamed &lt;code&gt;powershell.exe&lt;/code&gt; to &lt;code&gt;wt.exe&lt;/code&gt; (disguised as Windows Terminal), launched via VBScript &lt;sup id="fnref2"&gt;2&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux:&lt;/strong&gt; A Python script at &lt;code&gt;/tmp/ld.py&lt;/code&gt; running as a detached process &lt;sup id="fnref2"&gt;2&lt;/sup&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All variants beaconed every 60 seconds to &lt;code&gt;sfrclak[.]com&lt;/code&gt;. The dropper then cleaned up after itself: deleted &lt;code&gt;setup.js&lt;/code&gt;, deleted &lt;code&gt;package.json&lt;/code&gt;, and renamed a clean backup to &lt;code&gt;package.json&lt;/code&gt;. The directory looked normal after execution. &lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Google/Mandiant attributes the malware to &lt;strong&gt;UNC1069&lt;/strong&gt;, a North Korea-nexus threat actor active since 2018, based on overlap with the WAVESHAPER backdoor family. &lt;sup id="fnref3"&gt;3&lt;/sup&gt; Microsoft independently attributes it to &lt;strong&gt;Sapphire Sleet&lt;/strong&gt;, also North Korean. &lt;sup id="fnref4"&gt;4&lt;/sup&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  2FA didn't matter. That's the real story.
&lt;/h2&gt;

&lt;p&gt;Saayman had two-factor authentication enabled on his npm account. It didn't help.&lt;/p&gt;

&lt;p&gt;Once a RAT has full control of your machine, software-based TOTP is just another application the attacker can interact with. They changed his npm email to a Proton Mail address under their control (&lt;code&gt;ifstap@proton.me&lt;/code&gt;) and used a long-lived classic npm access token to publish. &lt;sup id="fnref5"&gt;5&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Here's what makes this worse: axios &lt;em&gt;already had&lt;/em&gt; OIDC-based publishing with provenance attestations since 2023. The last four legitimate v1 releases all went through GitHub Actions with Trusted Publishing. The malicious v1.14.1 had neither provenance nor attestations. Any tool checking for this would have flagged it instantly. &lt;sup id="fnref6"&gt;6&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;But npm has no setting to enforce OIDC-only publishing. There is no way to tell the registry: "reject anything not published through CI." The strictest option npm offers still allows local &lt;code&gt;npm publish&lt;/code&gt; with a browser-based 2FA prompt, which a RAT can trivially intercept. &lt;sup id="fnref6"&gt;6&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;As contributor shaanmajid &lt;a href="https://github.com/axios/axios/issues/10636" rel="noopener noreferrer"&gt;put it&lt;/a&gt;: "The only mitigation on Axios's end that could have actually prevented this would have been using hardware FIDO2 keys for maintainer npm auth, which can't be hijacked by a RAT." &lt;sup id="fnref6"&gt;6&lt;/sup&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What would have actually prevented this?
&lt;/h2&gt;

&lt;p&gt;Three things, none of which axios alone could control:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Registry-level OIDC enforcement.&lt;/strong&gt; If npm allowed packages to opt in to "reject all non-OIDC publishes," the RAT would have been useless for publishing. Other registries like crates.io already support this. &lt;sup id="fnref6"&gt;6&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Dependency cooldown periods.&lt;/strong&gt; The malicious versions were live for 3 hours. A 3-day cooldown on new versions (supported by Dependabot, Renovate, uv, and bun via &lt;code&gt;minimumReleaseAge&lt;/code&gt;) would have meant zero downloads of the poisoned packages. &lt;sup id="fnref6"&gt;6&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Provenance verification by default.&lt;/strong&gt; Every legitimate axios v1 release had OIDC provenance. The malicious one didn't. If package managers verified attestations by default instead of opt-in, this would have been caught at install time. &lt;sup id="fnref6"&gt;6&lt;/sup&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern is bigger than axios
&lt;/h2&gt;

&lt;p&gt;This attack follows the playbook Google documented for UNC1069: social engineering that targets individuals in crypto and AI, building elaborate fake identities and companies to establish trust before delivering malware. &lt;sup id="fnref7"&gt;7&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;What's different here is the target. This wasn't a crypto startup founder. It was a maintainer of a general-purpose HTTP library embedded in millions of projects globally. The blast radius isn't one company's treasury. It's the software supply chain itself.&lt;/p&gt;

&lt;p&gt;Feross Aboukhadijeh, founder of Socket, &lt;a href="https://github.com/axios/axios/issues/10636" rel="noopener noreferrer"&gt;summarized it&lt;/a&gt;: "This kind of targeted social engineering against individual maintainers is the new normal. It's not a reflection on Jason or the axios team. These campaigns are sophisticated and persistent. We're seeing them across the ecosystem and they're only accelerating." &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Singapore's Cyber Security Agency issued a formal advisory. &lt;sup id="fnref8"&gt;8&lt;/sup&gt; Microsoft, Google, SANS, Elastic, Snyk, Datadog, Huntress, and Malwarebytes all published analyses within days.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Social engineering is the attack vector.&lt;/strong&gt; The malware was simple. The social engineering was extraordinary. Fake companies, branded Slack workspaces, multi-person Teams calls, weeks of relationship building.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Software 2FA is not 2FA when your machine is compromised.&lt;/strong&gt; Hardware keys (FIDO2/WebAuthn) are the only defense against RAT-based credential theft.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm's security model has a structural gap.&lt;/strong&gt; There is no way to enforce "publish only from CI." Until registries support OIDC-only publishing, every maintainer's laptop is a viable attack surface.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provenance attestations work, but nobody checks them.&lt;/strong&gt; The malicious version was missing attestations that every legitimate version had. The signal was there. The ecosystem isn't wired to use it yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dependency cooldowns are free protection.&lt;/strong&gt; Configure &lt;code&gt;minimumReleaseAge&lt;/code&gt; in your dependency tools. A 3-day delay would have neutralized this entire attack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open source maintainers are high-value targets for state actors.&lt;/strong&gt; This is the new normal.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to check if you're affected
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"axios@(1&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;14&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;1|0&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;30&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;4)|plain-crypto-js"&lt;/span&gt; package-lock.json yarn.lock bun.lock pnpm-lock.yaml 2&amp;gt;/dev/null
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you find a match: downgrade to &lt;code&gt;axios@1.14.0&lt;/code&gt; or &lt;code&gt;0.30.3&lt;/code&gt;, remove &lt;code&gt;plain-crypto-js&lt;/code&gt; from &lt;code&gt;node_modules&lt;/code&gt;, rotate every secret and credential on the affected machine, and check network logs for connections to &lt;code&gt;sfrclak[.]com&lt;/code&gt; or &lt;code&gt;142.11.206.73&lt;/code&gt;. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;




&lt;p&gt;I break down stories like this on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. If this was useful, you'd probably like those too.&lt;/p&gt;







&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://github.com/axios/axios/issues/10636" rel="noopener noreferrer"&gt;axios post-mortem and maintainer comments, GitHub Issues #10636&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;&lt;a href="https://socket.dev/blog/axios-npm-package-compromised" rel="noopener noreferrer"&gt;Socket technical analysis of axios compromise&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/threat-intelligence/north-korea-threat-actor-targets-axios-npm-package" rel="noopener noreferrer"&gt;Google Cloud / Mandiant attribution to UNC1069&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;&lt;a href="https://www.microsoft.com/en-us/security/blog/2026/04/01/mitigating-the-axios-npm-supply-chain-compromise/" rel="noopener noreferrer"&gt;Microsoft Security Blog: Mitigating the Axios npm supply chain compromise&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn5"&gt;
&lt;p&gt;&lt;a href="https://thehackernews.com/2026/03/axios-supply-chain-attack-pushes-cross.html" rel="noopener noreferrer"&gt;The Hacker News: Axios Supply Chain Attack&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn6"&gt;
&lt;p&gt;&lt;a href="https://gist.github.com/shaanmajid/fa1bb71f063476f3e8fa726f54fd2d37" rel="noopener noreferrer"&gt;shaanmajid's registry evidence analysis&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn7"&gt;
&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/threat-intelligence/unc1069-targets-cryptocurrency-ai-social-engineering" rel="noopener noreferrer"&gt;Google Cloud: UNC1069 social engineering playbook&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn8"&gt;
&lt;p&gt;&lt;a href="https://www.csa.gov.sg/alerts-and-advisories/advisories/ad-2026-002/" rel="noopener noreferrer"&gt;Singapore CSA Advisory AD-2026-002&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>cybersecurity</category>
      <category>webdev</category>
      <category>opensource</category>
    </item>
    <item>
      <title>AI Doesn't Replace Thinking. It Replaces Forgetting.</title>
      <dc:creator>Sameer Khan</dc:creator>
      <pubDate>Fri, 03 Apr 2026 05:07:15 +0000</pubDate>
      <link>https://dev.to/monkfromearth/ai-doesnt-replace-thinking-it-replaces-forgetting-1hni</link>
      <guid>https://dev.to/monkfromearth/ai-doesnt-replace-thinking-it-replaces-forgetting-1hni</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; You've read thousands of articles. You can use almost none of them right now. The bottleneck in knowledge work isn't thinking. It's forgetting. Andrej Karpathy just showed a system where an LLM organizes your research into a living wiki, and the questions you ask feed back into it. No elaborate RAG pipelines. Just markdown, folders, and a loop that compounds.&lt;/p&gt;




&lt;p&gt;Think about how many articles you've read this year. Papers you've skimmed. Threads you've bookmarked. Podcasts you half-listened to while cooking.&lt;/p&gt;

&lt;p&gt;Now ask yourself: how many of those insights are available to you &lt;em&gt;right now&lt;/em&gt;, in this moment, for the thing you're working on today?&lt;/p&gt;

&lt;p&gt;The number is embarrassingly close to zero. Not because you're lazy. Not because you're not smart. Because your brain is a leaky bucket, and it always has been. You pour knowledge in, and most of it drains out before you need it. Every new project, every new question, you start from scratch. Even though the insight you need is somewhere in your past. You just can't reach it.&lt;/p&gt;

&lt;p&gt;That's the real problem with knowledge work. Not thinking. Forgetting.&lt;/p&gt;

&lt;h2&gt;
  
  
  What did Karpathy actually build?
&lt;/h2&gt;

&lt;p&gt;Andrej Karpathy, former Senior Director of AI at Tesla and founding member of OpenAI, &lt;a href="https://x.com/karpathy/status/2039805659525644595" rel="noopener noreferrer"&gt;shared a system&lt;/a&gt; this week that sounds almost too simple to be interesting. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Raw documents go into a folder. Articles, papers, repos, datasets, anything. An LLM reads them, then compiles everything into a structured markdown wiki. Summaries, backlinks, conceptual categories. Obsidian serves as the frontend. You browse the wiki like a personal Wikipedia.&lt;/p&gt;

&lt;p&gt;That part alone isn't new. People have been building "second brains" in Notion and Obsidian for years. The difference is what happens next.&lt;/p&gt;

&lt;p&gt;When you ask the system a question, the LLM doesn't just answer it. It researches its own wiki and synthesizes a response. Karpathy then often &lt;strong&gt;files that response back into the knowledge base&lt;/strong&gt;. The wiki grows. The next question is easier to answer because the system now knows more than it did an hour ago.&lt;/p&gt;

&lt;p&gt;Karpathy says he's running this at around 100 articles and 400,000 words. No elaborate RAG pipeline. Just organized markdown and an LLM that maintains its own indexes. "I rarely touch it directly," &lt;a href="https://x.com/karpathy/status/2039805659525644595" rel="noopener noreferrer"&gt;he wrote&lt;/a&gt;. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Think of it like a research assistant who doesn't just answer your questions. They reorganize your entire filing cabinet after every conversation, so the next question takes half the time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f4ohnjqa9js7kcdbx1n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5f4ohnjqa9js7kcdbx1n.png" alt="The compounding knowledge loop: raw docs flow into an LLM wiki, questions make the wiki richer, answers get filed back" width="800" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why does the loop matter more than the tool?
&lt;/h2&gt;

&lt;p&gt;The tool is markdown files and Obsidian. You could rebuild this in a weekend. The &lt;em&gt;loop&lt;/em&gt; is what makes it work.&lt;/p&gt;

&lt;p&gt;Most "second brain" systems die. You start a Notion workspace, organize it beautifully for two weeks, then life happens and it decays. The organization was the hard part, and it depended entirely on you showing up to maintain it. You were the bottleneck.&lt;/p&gt;

&lt;p&gt;Karpathy's system flips that. The LLM maintains the organization. The LLM runs "health checks" to find inconsistencies and suggest new articles. The system maintains itself. Every time you use it, it gets better. Not because you put in extra effort, but because &lt;em&gt;using it is the maintenance&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;That's compound interest applied to knowledge. Each question doesn't just give you an answer. It makes every future question cheaper. The blank page dies, not because AI writes for you, but because AI &lt;em&gt;remembers&lt;/em&gt; for you.&lt;/p&gt;

&lt;p&gt;I wrote about &lt;a href="https://dev.to/blogs/karpathy-autoresearch-explained-ml-to-marketing"&gt;Karpathy's AutoResearch&lt;/a&gt; two days ago. A loop that runs ML experiments while you sleep. Same pattern showing up again: &lt;strong&gt;the loop is the invention, not the tool&lt;/strong&gt;. A simple cycle that compounds is worth more than a sophisticated tool that doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Do we even need bigger context windows?
&lt;/h2&gt;

&lt;p&gt;Here's the contrarian part. The AI industry is racing toward bigger context windows. 1 million tokens. 10 million. Bigger windows and structured memory aren't mutually exclusive, but the default assumption is clear: if we can fit everything into one prompt, the model will figure it out.&lt;/p&gt;

&lt;p&gt;Karpathy's system uses markdown files and folders.&lt;/p&gt;

&lt;p&gt;Developer &lt;a href="https://x.com/jumperz/status/2039826228224430323" rel="noopener noreferrer"&gt;JUMPERZ put it well&lt;/a&gt;: "Agents that own their own knowledge layer do not need infinite context windows. They need good file organisation and the ability to read their own indexes. Way cheaper, way more scalable, and way more inspectable than stuffing everything into one giant prompt." &lt;sup id="fnref2"&gt;2&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;There's something familiar here. I keep noticing that &lt;a href="https://dev.to/blogs/good-products-hard-to-vary"&gt;constraints beat complexity&lt;/a&gt;. In product design, in engineering, and now in AI architecture. The pneumatic tyre hasn't changed in a century. The iPhone has been the same rectangle since 2017. And maybe the answer to AI's memory problem isn't a bigger brain. It's a better filing cabinet.&lt;/p&gt;

&lt;p&gt;A 10-million-token context window is brute force. An organized knowledge base with good indexes is architecture. One scales with money. The other scales with use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where does this go?
&lt;/h2&gt;

&lt;p&gt;Karpathy sees the endpoint. "Every question to a frontier-grade LLM spawns a team of LLMs to automate the whole thing," &lt;a href="https://x.com/karpathy/status/2039805659525644595" rel="noopener noreferrer"&gt;he wrote&lt;/a&gt;. "Iteratively construct an entire ephemeral wiki, lint it, loop a few times, then write a full report. Way beyond a &lt;code&gt;.decode()&lt;/code&gt;." &lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;Today, it's one person and one loop building a knowledge base over weeks. Tomorrow, a swarm of agents builds an entire wiki &lt;em&gt;per question&lt;/em&gt;. Assembling, cross-referencing, linting for errors, then handing you the distilled result. Not a chat response. A researched report backed by a temporary knowledge base that was purpose-built for your specific question, then discarded.&lt;/p&gt;

&lt;p&gt;The compound interest endpoint isn't just "you never start from zero." It's "you never even have to ask twice."&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The bottleneck in knowledge work isn't thinking. It's forgetting.&lt;/strong&gt; You've already had most of the insights you need. You just can't connect them to what you're working on now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Karpathy's system is a loop, not a tool.&lt;/strong&gt; Raw documents → LLM-compiled wiki → Q&amp;amp;A that feeds back into the wiki → compound growth. No elaborate RAG. Just markdown and folders.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-maintaining beats self-organizing.&lt;/strong&gt; Traditional second brains decay because you're the maintenance bottleneck. This system maintains itself. Using it &lt;em&gt;is&lt;/em&gt; the upkeep.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bigger context windows might be the wrong bet.&lt;/strong&gt; Good file organization and LLM-maintained indexes can be cheaper, more scalable, and more inspectable than stuffing everything into one massive prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The blank page is a symptom.&lt;/strong&gt; The disease is forgetting. The cure is a system where every question makes the next one easier.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;I break down things like this on &lt;a href="https://linkedin.com/in/monkfromearth" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/monkfromearth" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://instagram.com/monkfrom.earth" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;. Usually shorter, sometimes as carousels. If this resonated, you'd probably like those too&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://x.com/karpathy/status/2039805659525644595" rel="noopener noreferrer"&gt;Andrej Karpathy on X: LLM Knowledge Bases&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;&lt;a href="https://x.com/jumperz/status/2039826228224430323" rel="noopener noreferrer"&gt;JUMPERZ on X: commentary on Karpathy's knowledge base system&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
