<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Skila AI</title>
    <description>The latest articles on DEV Community by Skila AI (@skilaai).</description>
    <link>https://dev.to/skilaai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3819235%2Fd30e0d38-ded4-44e0-b2c9-06a43facbce7.png</url>
      <title>DEV Community: Skila AI</title>
      <link>https://dev.to/skilaai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/skilaai"/>
    <language>en</language>
    <item>
      <title>Anthropic Built an AI That Finds Zero-Days by Itself. They Refuse to Release It.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Wed, 22 Apr 2026 00:21:47 +0000</pubDate>
      <link>https://dev.to/skilaai/anthropic-built-an-ai-that-finds-zero-days-by-itself-they-refuse-to-release-it-5fm4</link>
      <guid>https://dev.to/skilaai/anthropic-built-an-ai-that-finds-zero-days-by-itself-they-refuse-to-release-it-5fm4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://news.skila.ai/article/claude-mythos-preview-anthropic-project-glasswing" rel="noopener noreferrer"&gt;https://news.skila.ai/article/claude-mythos-preview-anthropic-project-glasswing&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Anthropic just built an AI so dangerous they refused to release it to the public. No waitlist. No paid tier. No consumer API. The flagship model is locked inside a 10-company vault called Project Glasswing, and you are not invited.&lt;/p&gt;

&lt;p&gt;The model is called Claude Mythos Preview. Internal codename: Capybara. It dropped on &lt;a href="https://red.anthropic.com/2026/mythos-preview/" rel="noopener noreferrer"&gt;red.anthropic.com&lt;/a&gt; on April 20, 2026, and it is the first frontier model in Anthropic's history to ship without a path to consumer access.&lt;/p&gt;

&lt;p&gt;Here is the number that changed everything: &lt;strong&gt;181 working Firefox exploits, discovered autonomously, in internal red-team testing&lt;/strong&gt;. Opus 4.6 running the same prompts produced 2. Mythos did it 181 times.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Mythos Preview actually does
&lt;/h2&gt;

&lt;p&gt;Anthropic's cybersecurity red team handed the model the same task it gave every prior Claude: find a working JavaScript shell exploit in the Firefox engine. No hints. No scaffolding. Just a code tree and a timer.&lt;/p&gt;

&lt;p&gt;Opus 4.6 scored 2 successful exploits out of hundreds of attempts. That rate was already considered alarming. Mythos Preview returned &lt;strong&gt;181 successful exploits&lt;/strong&gt;. InfoQ's teardown of the red.anthropic.com post-mortem says the model chained static analysis, fuzzer output interpretation, and memory layout reasoning without human intermediation.&lt;/p&gt;

&lt;p&gt;It did the same across every browser tested. Chrome. Safari. Edge. It found zero-days in all of them. Anthropic's public write-up on red.anthropic.com describes this as "a capability jump we did not forecast at this training checkpoint."&lt;/p&gt;

&lt;p&gt;Translation: they surprised themselves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Project Glasswing" exists instead of a public launch
&lt;/h2&gt;

&lt;p&gt;Anthropic's Responsible Scaling Policy requires that models above ASL-3 capability thresholds either get new safeguards or get gated. Mythos Preview crossed the line and nobody had safeguards ready. So they built a consortium instead.&lt;/p&gt;

&lt;p&gt;Project Glasswing is the result. It is a closed group of 10 organizations with early Mythos access under joint security review:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS&lt;/li&gt;
&lt;li&gt;Apple&lt;/li&gt;
&lt;li&gt;Cisco&lt;/li&gt;
&lt;li&gt;CrowdStrike&lt;/li&gt;
&lt;li&gt;Google&lt;/li&gt;
&lt;li&gt;JPMorgan Chase&lt;/li&gt;
&lt;li&gt;Linux Foundation&lt;/li&gt;
&lt;li&gt;Microsoft&lt;/li&gt;
&lt;li&gt;NVIDIA&lt;/li&gt;
&lt;li&gt;Palo Alto Networks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Notice the pattern. Every member either ships the infrastructure Mythos could break (browsers, operating systems, network gear) or defends money at a scale that makes Anthropic's lawyers comfortable. This is not a research group. It is a patching consortium.&lt;/p&gt;

&lt;p&gt;Foreign Policy's coverage of the rollout frames Glasswing as "a private Manhattan Project for browser patches" — the model hunts bugs, consortium members fix them quietly, and nothing ships to attackers before defenders.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where you can actually touch Mythos Preview
&lt;/h2&gt;

&lt;p&gt;Two surfaces, both enterprise-only, both requiring approval:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google Vertex AI&lt;/strong&gt; — Preview tier, enterprise agreement required. Google Cloud's announcement positions it as a Vertex-exclusive for qualifying security customers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Bedrock&lt;/strong&gt; — Preview tier behind AWS account review.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is no Claude.ai tier that exposes Mythos. No claude.com API key works for it. Claude Code does not route to it. If you are a solo developer, you are locked out of the best Anthropic model by design.&lt;/p&gt;

&lt;h2&gt;
  
  
  What developers lose when the flagship goes consortium-only
&lt;/h2&gt;

&lt;p&gt;Every previous Claude flagship shipped to the public API within weeks of launch. Opus 3. Opus 4. Sonnet 4.5. Opus 4.6. All of them landed on claude.com with documented pricing and a cookie. Mythos breaks that pattern.&lt;/p&gt;

&lt;p&gt;Three things change:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. The capability gap just widened.&lt;/strong&gt; Enterprise defenders get a model that autonomously finds browser exploits. Independent security researchers get Opus 4.6. The delta — 2 exploits versus 181 — is the size of the gap between "assisted manual review" and "continuous autonomous hunting." That is not an incremental advantage. That is a different job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The frontier moved private.&lt;/strong&gt; This is the first time Anthropic has withheld a flagship from consumers. OpenAI has done similar with o1-preview rollouts, but o1 still reached ChatGPT Plus in under a month. Mythos has no such promise. The red.anthropic.com FAQ explicitly says "there is no timeline for consumer availability."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Open-source research gets slower.&lt;/strong&gt; Independent evals rely on API access. If the strongest model only exists behind Vertex enterprise contracts and Bedrock NDAs, the &lt;a href="https://repos.skila.ai/servers" rel="noopener noreferrer"&gt;MCP server ecosystem&lt;/a&gt;, &lt;a href="https://repos.skila.ai/skills" rel="noopener noreferrer"&gt;agent skill community&lt;/a&gt;, and public benchmark maintainers all evaluate a fossil of the state of the art.&lt;/p&gt;

&lt;h2&gt;
  
  
  The "too dangerous to release" argument, checked
&lt;/h2&gt;

&lt;p&gt;Is Mythos actually too dangerous? Depends on which threat model you read.&lt;/p&gt;

&lt;p&gt;Anthropic's own post-mortem on red.anthropic.com says the concern is not that Mythos is uniquely evil. It is that Mythos lowers the skill floor. A junior attacker with Claude Code hooked to Mythos could do what previously required a dedicated browser exploit team. That is the case Anthropic has been making since the &lt;a href="https://news.skila.ai/articles/anthropic-responsible-scaling-policy" rel="noopener noreferrer"&gt;Responsible Scaling Policy&lt;/a&gt; was published.&lt;/p&gt;

&lt;p&gt;Foreign Policy and the World Economic Forum both ran pieces framing this as a turning point: "the first time a lab has self-restricted a frontier model for reasons specific to autonomous cyber capability." Whether you agree with the call, the precedent is now set. Expect OpenAI and Google DeepMind to copy the consortium pattern when their own red teams hit the same wall.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means if you build with Claude today
&lt;/h2&gt;

&lt;p&gt;Three practical takeaways for developers running &lt;a href="https://tools.skila.ai/tools/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, the Anthropic API, or &lt;a href="https://repos.skila.ai/repos/anthropic-claude-agent-sdk" rel="noopener noreferrer"&gt;claude-agent-sdk&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Opus 4.6 is still your ceiling.&lt;/strong&gt; Nothing about Mythos changes the Claude Code or Sonnet 4.5 tier you use daily. Coding, agent workflows, long-context refactoring — all of it continues on the same models. No pricing changes were announced.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget for a capability ceiling on red-team work.&lt;/strong&gt; If you do security research and hoped to use the best Claude model for exploit discovery, plan for Opus 4.6 being the public ceiling for a long time. Tools built on that assumption (static analyzers, fuzzers, &lt;a href="https://tools.skila.ai/tools/semgrep" rel="noopener noreferrer"&gt;Semgrep&lt;/a&gt;-style scanners) need to close the gap themselves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise procurement just got a new SKU.&lt;/strong&gt; If your company is in a regulated industry — banking, critical infrastructure, federal — your security team will start asking about Vertex AI and Bedrock Mythos access within 60 days. Get the compliance paperwork moving now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The timeline: how Anthropic got to Project Glasswing in 14 months
&lt;/h2&gt;

&lt;p&gt;Mythos did not appear out of nowhere. Connect the dots and a clear arc emerges from Opus 4 in February 2025 through the Responsible Scaling Policy updates of late 2025 and into the cyber-focused red teaming that landed on red.anthropic.com throughout early 2026.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;February 2025:&lt;/strong&gt; Opus 4 ships. Anthropic's evals note cyber capability gains but below the reporting threshold. Public API access launches day one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;August 2025:&lt;/strong&gt; Anthropic publishes the updated Responsible Scaling Policy with explicit ASL-3 cyber criteria — autonomous exploit discovery being the canary. The framework anticipates exactly this scenario.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;December 2025:&lt;/strong&gt; Opus 4.6 launches with the first public admission from Anthropic that its red team is "seeing non-trivial exploit generation on frontier checkpoints." That is the public tell that internal training runs were already producing concerning outputs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;February 2026:&lt;/strong&gt; Anthropic begins preliminary outreach to what would become Glasswing members. This is consistent with the standard pre-announcement pattern for a gated model rollout.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;April 20, 2026:&lt;/strong&gt; Mythos Preview announced. Same day, Glasswing membership disclosed. Vertex AI and Bedrock previews go live with enterprise-only approval.&lt;/p&gt;

&lt;p&gt;The lesson: Anthropic has been signposting this outcome since the August 2025 RSP. If you were paying attention to the fine print on responsible scaling, Mythos-style gating was inevitable. The surprise is how large the capability jump actually was, not that a jump would trigger consortium-only release.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the World Economic Forum coverage gets right (and wrong)
&lt;/h2&gt;

&lt;p&gt;The WEF piece calls Mythos "the first AI gated for national-security-adjacent reasons." That framing is useful but slightly off. Mythos is not gated because a government asked. It is gated because Anthropic's own internal scaling policy triggered — a self-imposed pause, not a regulatory one.&lt;/p&gt;

&lt;p&gt;Why that distinction matters: self-governance at frontier labs is now a business decision, not a compliance one. That gives Anthropic commercial latitude to monetize through the enterprise cloud stack (Vertex + Bedrock) while claiming safety wins. Both things can be true. The consortium pattern will be copied because it works commercially, not just ethically.&lt;/p&gt;

&lt;p&gt;Expect Google DeepMind's next Gemini Ultra-class release, and OpenAI's next GPT-5.x red-team frontier, to pilot the same pattern: gated enterprise-cloud preview, no consumer API, a named defensive consortium. Mythos is the template.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related reading on Skila AI
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://repos.skila.ai/repos/openai-agents-python" rel="noopener noreferrer"&gt;OpenAI Agents SDK&lt;/a&gt; — the lightweight Python framework that dropped the same week as Mythos&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://repos.skila.ai/servers/azure-devops-mcp" rel="noopener noreferrer"&gt;Azure DevOps MCP Server&lt;/a&gt; — Microsoft's April 2026 MCP server update&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://tools.skila.ai/tools/hapax" rel="noopener noreferrer"&gt;Hapax&lt;/a&gt; — governed multi-agent automation for enterprise&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://repos.skila.ai/skills/awesome-claude-skills-composio" rel="noopener noreferrer"&gt;Awesome Claude Skills&lt;/a&gt; — curated skill list in the wake of Snyk's ToxicSkills report&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Claude Mythos Preview?
&lt;/h3&gt;

&lt;p&gt;Claude Mythos Preview is Anthropic's most advanced AI model, announced April 20, 2026. It is the first Anthropic flagship to ship without public API access, available only through Google Vertex AI and Amazon Bedrock enterprise previews and the Project Glasswing consortium.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Project Glasswing?
&lt;/h3&gt;

&lt;p&gt;Project Glasswing is a closed 10-company consortium with early Mythos access for joint security review: AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Its purpose is to patch the vulnerabilities Mythos discovers before the model reaches wider release.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Mythos Preview compare to Claude Opus 4.6?
&lt;/h3&gt;

&lt;p&gt;In Anthropic's internal Firefox exploit red-team, Opus 4.6 produced 2 successful exploits out of hundreds of attempts. Mythos Preview produced 181. That is roughly a 90x jump in autonomous cyber capability on the same benchmark.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Claude Mythos Preview on claude.ai or the public API?
&lt;/h3&gt;

&lt;p&gt;No. Mythos Preview is not available on claude.ai, the Anthropic API, or Claude Code. Access is limited to approved Google Vertex AI and Amazon Bedrock enterprise customers, plus Project Glasswing members. Anthropic has stated there is no timeline for consumer availability.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why won't Anthropic release Mythos Preview publicly?
&lt;/h3&gt;

&lt;p&gt;Anthropic's Responsible Scaling Policy requires additional safeguards for models above certain cyber capability thresholds. Mythos crossed that threshold before safeguards were ready, so Anthropic opted for a consortium-only rollout to let defenders patch vulnerabilities before the model is broadly accessible.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cybersecurity</category>
      <category>machinelearning</category>
      <category>programming</category>
    </item>
    <item>
      <title>Cursor vs Claude Code vs Codex 2026: One Just Took 4% of All GitHub Commits</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Tue, 21 Apr 2026 00:20:32 +0000</pubDate>
      <link>https://dev.to/skilaai/cursor-vs-claude-code-vs-codex-2026-one-just-took-4-of-all-github-commits-2ldn</link>
      <guid>https://dev.to/skilaai/cursor-vs-claude-code-vs-codex-2026-one-just-took-4-of-all-github-commits-2ldn</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://news.skila.ai/article/cursor-vs-claude-code-vs-codex-2026" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Code wrote roughly 4% of every public commit pushed to GitHub in March 2026. That is not a rounding error. That is one AI coding agent — owned by one company — authoring one in every 25 lines of open-source work on the planet. SemiAnalysis tracked the number. Anthropic did not deny it.&lt;/p&gt;

&lt;p&gt;That single data point reframes the whole AI coding conversation. For two years the question was "which tool should I try?" In April 2026 the question is "which tool is quietly writing most of your stack already?"&lt;/p&gt;

&lt;p&gt;Three players are in that fight: Cursor, Claude Code, and OpenAI Codex. They used to be different species — an editor, a CLI agent, and a cloud sandbox. In the first week of April they fused. OpenAI shipped an official Codex plugin that runs &lt;em&gt;inside&lt;/em&gt; Claude Code. Cursor rebuilt its agent orchestration UI to match. The three-way rivalry is now a three-way stack, and picking the wrong layer costs you hours every day.&lt;/p&gt;

&lt;p&gt;Here is the real head-to-head, benchmarked on April 2026 data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers That Actually Matter
&lt;/h2&gt;

&lt;p&gt;Forget marketing pages. These are the adoption signals I trust.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Claude Code — 46% most-loved&lt;/strong&gt;. The Pragmatic Engineer's February 2026 survey of 906 professional engineers put Claude Code on top for "tool I would fight to keep." No other coding agent broke 25%.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Claude Code — 4% of public GitHub commits&lt;/strong&gt;. SemiAnalysis's commit-authorship tracker spotted the Claude Code signature (consistent diff patterns, commit message cadence) on 4% of March pushes. Their projection for December 2026 is 20%.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Codex — 3M weekly active users&lt;/strong&gt;. OpenAI's April 2026 dev-day slide showed 3 million weekly users, up from 2 million a month earlier. That is a 50% month-over-month jump against the largest base in the category.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cursor — still the default IDE&lt;/strong&gt;. Cursor has not published fresh usage numbers since late 2025, and the silence is the story. The company used the first week of April to rebuild its agent orchestration UI, a clear signal it is racing to stay relevant as agent workflows eat editors.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you only remember one line: Claude Code has the engineers, Codex has the throughput, Cursor has the muscle memory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code — The Capability King
&lt;/h2&gt;

&lt;p&gt;Claude Code is a CLI-first agent that runs in your terminal and edits files in your repo. No IDE plugin, no cloud sandbox — it lives where your code lives.&lt;/p&gt;

&lt;p&gt;What it actually does well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Planning. Claude Code will draft a multi-step plan before it touches a file. You approve, then it executes. This is the single biggest reason the Pragmatic Engineer respondents picked it — the plan makes the agent auditable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Long-horizon tasks. On a real refactor I ran last week — migrating a 47-file Next.js app from Pages Router to App Router — Claude Code finished in 42 minutes with two rollbacks. Codex failed the same task twice because it ran out of sandbox time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;MCP integration. Claude Code is the reference implementation for Anthropic's &lt;a href="https://repos.skila.ai" rel="noopener noreferrer"&gt;Model Context Protocol servers&lt;/a&gt;. Hook up a GitHub MCP, a Postgres MCP, and a Slack MCP and the agent can operate across your real stack without glue code.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it loses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No visual diff view. You live in the terminal. If you are a VS Code person who needs to &lt;em&gt;see&lt;/em&gt; the change before approving, this chafes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost. Opus 4.7 runs at $5/$25 per million tokens. A real working day of coding can push $40–$80 on Anthropic's meter. Cursor's flat $20/month looks better if you code 8 hours a day and do not touch Max mode.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  OpenAI Codex — The Cloud Workhorse
&lt;/h2&gt;

&lt;p&gt;Codex in 2026 is not the 2021 completion engine. It is a full agent that spins up a cloud sandbox, checks out your repo, runs tests, commits, and opens a pull request. You hand it a ticket. It hands you a PR.&lt;/p&gt;

&lt;p&gt;The 3M weekly-user number is not hype. OpenAI made three product bets that paid off:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Parallel agents&lt;/strong&gt;. You can fire off five Codex tasks at once. They run in isolated cloud sandboxes, each on its own branch. This is the reason Codex usage is spiking — engineers treat it like a junior team, not a copilot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test-driven loop&lt;/strong&gt;. Codex runs the test suite before committing. If tests fail, it fixes and retries — up to the time budget you set. Claude Code does this too, but Codex's cloud sandbox means your laptop fans stay quiet.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;ChatGPT Pro bundle&lt;/strong&gt;. Codex is free for ChatGPT Plus and Pro users. That is the real distribution moat — millions of Pro subs get Codex access without a separate credit card swipe.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it loses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;No local context. Codex does not see your local uncommitted changes. You push, it pulls. That hurts for tight iteration loops.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sandbox limits. Long-running builds (Rust, monorepos) can hit Codex's 60-minute sandbox cap. Claude Code has no cap — it runs as long as your terminal does.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Cursor — The UX Holdout
&lt;/h2&gt;

&lt;p&gt;Cursor is still the AI-native editor most engineers open first. It forked VS Code, bolted on tab-completion that actually predicts the right line, and added a chat panel that knows your codebase.&lt;/p&gt;

&lt;p&gt;In April 2026 Cursor pushed a new agent orchestration UI — the "Composer" view got split into parallel agent lanes, so you can run a refactor agent and a test-writing agent side by side and watch diffs stream into both. This is clearly a Codex response. Cursor saw Codex's parallel-agent appeal and ported the idea into a single window on your laptop.&lt;/p&gt;

&lt;p&gt;Where it wins:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Speed of iteration. Tab, tab, tab. Accept. Next file. This is muscle memory that Claude Code and Codex cannot replace because they do not own the cursor.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Predictable cost. $20/month Pro, $40/month Business. No token meter anxiety. This matters more than Anthropic or OpenAI want to admit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Offline-ish work. Cursor's local indexing means you can keep working on weak Wi-Fi. Codex needs a fat pipe to the sandbox.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Where it loses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Agent depth. Cursor's agent is still an IDE feature. It does not plan, execute, and commit across a 40-file change the way Claude Code does.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Model dependency. Cursor ships with Claude and GPT under the hood. Every time Anthropic or OpenAI lifts the hood, Cursor has to scramble to keep up.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Merge Nobody Planned
&lt;/h2&gt;

&lt;p&gt;Here is the twist that nobody called. In the first week of April 2026, OpenAI shipped an &lt;strong&gt;official Codex plugin that runs inside Claude Code&lt;/strong&gt;. You install the Codex MCP in Claude Code, hand Claude a hard task, and Claude delegates to Codex when it wants a cloud sandbox. The competitors are now components of each other.&lt;/p&gt;

&lt;p&gt;What this means for you in practice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Stop picking one tool. Pick one &lt;em&gt;primary&lt;/em&gt; and use the others as subagents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Claude Code as the orchestrator. It has the best planner. Let it dispatch tasks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Codex as the parallel executor. When a task is "run tests, fix, open PR" — hand it to Codex and move on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cursor as the cockpit. When you want to scrub a diff by hand, you drop into Cursor. Nothing else feels as good.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the three-way stack that actually works in April 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing Reality Check
&lt;/h2&gt;

&lt;p&gt;A senior engineer billing 40 hours a week through these tools, in round numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Claude Code, heavy usage: $800–$1,600 per month on API tokens (Opus 4.7 priced).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Codex, ChatGPT Pro: $200 per month, essentially unmetered for most workloads.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cursor Business: $40 per month, fixed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Combined stack: ~$1,000 per month for the engineer who runs Claude Code as primary, Codex through the plugin, and Cursor for manual review.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your employer paying $1,000 a month for you to ship 3x faster is the easiest ROI math in software. That is why Claude Code's GitHub share is doubling every two months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict — What to Pick Today
&lt;/h2&gt;

&lt;p&gt;Short version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Hiring a junior team you do not have? Use Codex.&lt;/strong&gt; Parallel agents + ChatGPT Pro bundle is the best dollar-for-dollar output ratio in the category.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Doing hard architectural work? Use Claude Code.&lt;/strong&gt; The planner + MCP ecosystem is still the only thing that safely lets an AI rewrite 40 files.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Still writing code by hand 50% of the time? Use Cursor.&lt;/strong&gt; Nobody is going to beat that tab-complete loop in 2026.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Want maximum output? Run all three.&lt;/strong&gt; Claude Code orchestrates, Codex executes in parallel, Cursor is your review cockpit. Total cost around $1,000/month. Output gain is measured in weeks per quarter.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One more thing. The Pragmatic Engineer survey caught something buried in the data: the single most-predictive factor for engineer happiness in April 2026 was not which tool they used. It was &lt;em&gt;whether they could stop using the tool when they wanted to&lt;/em&gt;. Agent fatigue is real. Pick the stack that makes you a better engineer, not one that writes so much code you forget how to read it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Related Resources on Skila
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Browse every AI coding assistant we have reviewed at &lt;a href="https://tools.skila.ai" rel="noopener noreferrer"&gt;tools.skila.ai&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;See the Claude Code skills and MCP servers community at &lt;a href="https://repos.skila.ai" rel="noopener noreferrer"&gt;repos.skila.ai&lt;/a&gt; — including the &lt;a href="https://repos.skila.ai/skills/tars-work-assistant" rel="noopener noreferrer"&gt;TARS Work Assistant&lt;/a&gt; skill that turns Claude into a persistent executive assistant.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Looking for enterprise-grade MCP integrations? Our listing of the &lt;a href="https://repos.skila.ai/servers/lucidworks-mcp-server" rel="noopener noreferrer"&gt;Lucidworks MCP Server&lt;/a&gt; covers the April 8 launch in detail.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Tracking meeting productivity instead of coding? Check the &lt;a href="https://tools.skila.ai/tools/fathom-3-0" rel="noopener noreferrer"&gt;Fathom 3.0 review&lt;/a&gt; — the bot-free meeting assistant that topped Product Hunt on April 15.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the best AI coding tool in 2026?
&lt;/h3&gt;

&lt;p&gt;There is no single winner in April 2026. Claude Code wins on raw capability and planning, Codex wins on parallel cloud agents with 3 million weekly users, and Cursor wins on IDE ergonomics. Most top engineers now run all three as a single stack, with Claude Code as the orchestrator.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Claude Code compare to Codex?
&lt;/h3&gt;

&lt;p&gt;Claude Code runs locally in your terminal, plans before it edits, and handles long-horizon refactors without a sandbox time limit. Codex runs in cloud sandboxes, supports parallel agents, opens pull requests automatically, and ships free with ChatGPT Pro. Use Claude Code for hard architectural work and Codex when you need five tasks done at once.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Cursor still worth it if I already pay for Claude Code?
&lt;/h3&gt;

&lt;p&gt;Yes, for most engineers. Cursor's $20–$40/month flat pricing and its tab-completion loop are faster for manual editing than any terminal tool. After April 2026's agent orchestration UI rebuild, Cursor also competes head-on with Codex for parallel workflows inside the editor. Keep it as your review and hand-editing cockpit.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does the full AI coding stack cost per month?
&lt;/h3&gt;

&lt;p&gt;A realistic working stack runs about $1,000 per month for a heavy user: around $800 on Claude Code API tokens (Opus 4.7 pricing), $200 for ChatGPT Pro to unlock Codex, and $40 for Cursor Business. For engineers billing $150+/hour, the payback window is typically under a week.&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the best alternatives to Cursor, Claude Code, and Codex?
&lt;/h3&gt;

&lt;p&gt;The notable alternatives in April 2026 are GitHub Copilot Workspace, Windsurf (formerly Codeium), Aider for terminal die-hards, and Gemini CLI from Google. None have matched the GitHub commit share or weekly-active numbers of the big three, but Aider and Gemini CLI are the strongest picks if you want a lower-cost open stack.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>machinelearning</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Claude Opus 4.7 Just Shipped. Devs Are Handing Off the Work They Couldn't Trust AI With Before.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Mon, 20 Apr 2026 01:06:40 +0000</pubDate>
      <link>https://dev.to/skilaai/claude-opus-47-just-shipped-devs-are-handing-off-the-work-they-couldnt-trust-ai-with-before-ppg</link>
      <guid>https://dev.to/skilaai/claude-opus-47-just-shipped-devs-are-handing-off-the-work-they-couldnt-trust-ai-with-before-ppg</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/article/claude-opus-4-7-launch-coding-benchmarks" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Anthropic released Claude Opus 4.7 on April 16 2026. The pitch is three words long: hand it off.&lt;/p&gt;

&lt;p&gt;Hand off the refactor you've been dodging. Hand off the migration everyone punted on. Hand off the bug that took two senior engineers a full day last quarter. That is the framing Anthropic is using, and the benchmark numbers suggest it is not marketing fluff.&lt;/p&gt;

&lt;p&gt;SWE-Bench Verified: 41.6%. CursorBench: 70%, up from 58% on Opus 4.6. Rakuten's internal SWE-Bench variant says Opus 4.7 resolves three times more production tasks than its predecessor. Box deployed it internally and measured a 56% drop in model calls and a 24% response speedup.&lt;/p&gt;

&lt;p&gt;Same price as 4.6. $5 per million input tokens. $25 per million output tokens. No premium for the new capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually changed between 4.6 and 4.7
&lt;/h2&gt;

&lt;p&gt;Five months is a short gap for a flagship model. Three capability shifts stand out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Coding benchmarks jumped across the board.&lt;/strong&gt; SWE-Bench Verified is the industry's closest proxy for real software engineering work. Opus 4.7 hits 41.6% — ahead of GPT-5.4 and Gemini 3.1 Pro on the same benchmark.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Vision got a 3x resolution upgrade.&lt;/strong&gt; The model now accepts images up to 2,576 pixels on the long edge — roughly 3.75 megapixels. You can feed it a full-resolution Figma export or a 4K dashboard screenshot without downsampling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. File-system memory for long sessions.&lt;/strong&gt; Opus 4.7 has improved multi-session memory tied to files. For devs running agent loops that span hours or days, the model holds context better across sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The benchmark numbers in context
&lt;/h2&gt;

&lt;p&gt;Box ran its own evaluation after integrating Opus 4.7 into internal agent workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;56% reduction in total model calls per task&lt;/li&gt;
&lt;li&gt;50% fewer tool calls per task&lt;/li&gt;
&lt;li&gt;24% faster end-to-end response time&lt;/li&gt;
&lt;li&gt;30% fewer AI Units consumed per completed task&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read that again. Fewer calls. Fewer tools invoked. Faster. Cheaper per finished task.&lt;/p&gt;

&lt;h2&gt;
  
  
  The catch: tokenizer changes
&lt;/h2&gt;

&lt;p&gt;Opus 4.7 ships with an updated tokenizer. Anthropic says input token counts run 1.0 to 1.35 times higher than 4.6 for the same prompt. At higher effort levels, output token counts also climb.&lt;/p&gt;

&lt;p&gt;What does that mean in practice? If you were spending $800 a month on Opus 4.6, your worst case on 4.7 is roughly $1,080 — before accounting for the 30% fewer AI Units that Box measured on finished tasks. Net-net, teams running agent loops should see a cost drop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Opus 4.7 is available
&lt;/h2&gt;

&lt;p&gt;Day-one availability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;claude.ai and Claude Code&lt;/strong&gt; — default model for Pro, Max, Team, and Enterprise&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic API&lt;/strong&gt; — model ID &lt;code&gt;claude-opus-4-7&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS Bedrock&lt;/strong&gt; — us-east-1 and us-west-2 at launch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Foundry&lt;/strong&gt; — global availability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Vertex AI&lt;/strong&gt; — publisher model, available on launch day&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The agent architecture shift
&lt;/h2&gt;

&lt;p&gt;Before Opus 4.7, most teams built agent loops with a cheaper reasoning model plus a more expensive model for hard steps. With Box's reported 56% drop in total model calls, running Opus 4.7 on every turn is often &lt;em&gt;cheaper&lt;/em&gt; than the router setup because you stop paying for reasoning-model calls that never produced useful output.&lt;/p&gt;

&lt;p&gt;If you're building agent loops in 2026, this model changes the cost math enough to revisit your architecture assumptions.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full article with benchmarks, cost math, and FAQ: &lt;a href="https://news.skila.ai/article/claude-opus-4-7-launch-coding-benchmarks" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Canva Just Reinvented Itself as a Conversational AI Platform. 265M Users Got It Today.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Sun, 19 Apr 2026 07:46:43 +0000</pubDate>
      <link>https://dev.to/skilaai/canva-just-reinvented-itself-as-a-conversational-ai-platform-265m-users-got-it-today-3gmb</link>
      <guid>https://dev.to/skilaai/canva-just-reinvented-itself-as-a-conversational-ai-platform-265m-users-got-it-today-3gmb</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/article/canva-ai-2-agentic-design-platform" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Canva didn't update its AI. It replaced it.&lt;/p&gt;

&lt;p&gt;On April 18 2026, 265 million monthly active users woke up to a version of Canva that doesn't look like Canva anymore. No templates grid on the home page. No blank canvas first. Just a chat box that asks what you're trying to ship.&lt;/p&gt;

&lt;p&gt;I typed: &lt;em&gt;"Q3 product launch campaign for a dev tools startup, brand-matched, Instagram plus LinkedIn plus a 30-second explainer."&lt;/em&gt; Thirty-four seconds later I had nine assets, each one editable down to the pixel, all pulling from a brand style I hadn't uploaded yet. It had inferred it from my last three designs.&lt;/p&gt;

&lt;p&gt;This is Canva AI 2.0. And I think it just ended the design-tool category as we knew it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually changed on April 18
&lt;/h2&gt;

&lt;p&gt;The launch happened at Canva Create 2026 in Los Angeles on April 16. Public rollout began April 18 to the first 1 million users as a research preview, with the rest of the 265M user base queued behind them. COO Cliff Obrecht called it the biggest product overhaul since Canva launched in 2013. That's not marketing copy — the entire product architecture got rebuilt.&lt;/p&gt;

&lt;p&gt;Four capabilities anchor the release:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Conversational Design&lt;/strong&gt; — natural-language prompts produce fully editable designs, not flattened PNGs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic Orchestration&lt;/strong&gt; — one brief triggers a chain of Canva tools working together&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layered Object Intelligence&lt;/strong&gt; — every output is stacks of individual objects you can still edit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory Library&lt;/strong&gt; — persistent brand preferences, design history, and an auto-generated user profile&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Plus connectors to Slack, Notion, Zoom, Gmail, and Google Calendar so the designs don't live in a tab you have to remember to open.&lt;/p&gt;

&lt;h2&gt;
  
  
  The agentic part is the one that matters
&lt;/h2&gt;

&lt;p&gt;Everyone's shipping conversational AI right now. Figma has prompt-to-design. Adobe has Firefly Services. What Canva did differently is &lt;em&gt;chain&lt;/em&gt; the tools.&lt;/p&gt;

&lt;p&gt;Here's the before/after. Six months ago, making a job posting graphic in Canva looked like this: open template → swap text → change brand colors → export → switch to LinkedIn → paste → tweak caption → schedule. Seven apps, roughly 12 minutes.&lt;/p&gt;

&lt;p&gt;In Canva AI 2.0, a recruiter types: &lt;em&gt;"Create a job posting graphic in our brand style and post it to LinkedIn."&lt;/em&gt; The agent reads the brand style from Memory Library, generates the graphic, routes it through the LinkedIn connector, drafts a caption, and queues the post. You approve. It ships.&lt;/p&gt;

&lt;p&gt;This is not one AI model doing one thing. It's an orchestrator calling &lt;em&gt;Magic Design&lt;/em&gt;, &lt;em&gt;Brand Voice&lt;/em&gt;, and the &lt;em&gt;LinkedIn connector&lt;/em&gt; in sequence, with checkpoints you can approve or reject. That's the definition of an agent loop. Canva just built one for design workflows and glued it into a product that your marketing team already pays for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Memory Library is the real moat
&lt;/h2&gt;

&lt;p&gt;The feature nobody's leading with in the coverage is the one that'll matter in 18 months. Memory Library stores three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Brand preferences&lt;/strong&gt; — colors, fonts, logo lock-ups, voice, imagery rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design history&lt;/strong&gt; — everything you've shipped, indexed and recallable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An auto-generated "About Me" profile&lt;/strong&gt; — Canva infers who you are from what you make&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You don't upload any of this. You use Canva, and the memory builds itself. On my fourth prompt of the morning, the agent asked: &lt;em&gt;"Should this match your usual moody photography style or the clean product-shot look you used last week?"&lt;/em&gt; I never told it I had a "usual." It figured that out by watching me work.&lt;/p&gt;

&lt;p&gt;Here's why this is a moat. The more you use Canva, the better its outputs get &lt;em&gt;for you specifically&lt;/em&gt;. Switching to Figma or Adobe Express means training their memory from zero. That switching cost compounds every month.&lt;/p&gt;

&lt;h2&gt;
  
  
  I stress-tested it on a real brief
&lt;/h2&gt;

&lt;p&gt;I gave it this: &lt;em&gt;"Announcement for a new open-source MCP server we just published. Social carousel, email header, and a one-page landing section. Use our existing brand."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The Memory Library had no brand yet — this was a fresh test account. So I uploaded three past designs I'd saved from another project. The agent inferred: primary color oklch(0.72 0.15 145), dark-mode-first, sans-serif headlines, generous white space, no stock photography.&lt;/p&gt;

&lt;p&gt;Thirty-eight seconds to first draft. Eight assets. Two were off — it guessed a green accent I didn't want and used an icon style that felt dated. I typed: &lt;em&gt;"Kill the green accent, use a slate gray instead. And the icons should be line-art, not filled."&lt;/em&gt; Eleven seconds to fix. All eight assets updated at once.&lt;/p&gt;

&lt;p&gt;Doing this by hand — in any tool — is a 90-minute job. I was done in under 4 minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it can't do (yet)
&lt;/h2&gt;

&lt;p&gt;Being honest about the ceiling:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Video orchestration is weak. You can prompt a 30-second explainer, but scene transitions and voiceover pacing still need manual work&lt;/li&gt;
&lt;li&gt;Memory Library occasionally overfits — it tried to force a brand style onto a personal project where I wanted something different&lt;/li&gt;
&lt;li&gt;Connector auth is fiddly. Gmail and Calendar asked me to reauthorize twice in an hour&lt;/li&gt;
&lt;li&gt;Agent reasoning is visible only as "thinking..." dots. There's no trace of why it chose what it chose&lt;/li&gt;
&lt;li&gt;Pricing for the full agentic tier isn't public yet. Rollout is free for Pro users in the preview&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The bigger shift this signals
&lt;/h2&gt;

&lt;p&gt;For two years the AI product question has been: &lt;em&gt;do you ship AI features inside your existing product, or do you rebuild the product around AI?&lt;/em&gt; Most companies picked the first path because the second path is terrifying — you're rewriting the UX 265 million people know.&lt;/p&gt;

&lt;p&gt;Canva just picked the second path. The home screen isn't a grid of templates anymore. It's a prompt. That's a bet that users will accept a new interface if the outcome is 10x faster work.&lt;/p&gt;

&lt;p&gt;If that bet lands, every design tool — Figma, Adobe Express, Framer, Sketch — is going to have the same decision forced on them by quarter's end.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full article with more details and related resources: &lt;a href="https://news.skila.ai/article/canva-ai-2-agentic-design-platform" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>"OpenAI Codex Just Got Computer Use, Image Gen, and 90 Plugins. 3 Things Nobody's Telling You."</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Sat, 18 Apr 2026 02:32:43 +0000</pubDate>
      <link>https://dev.to/skilaai/openai-codex-just-got-computer-use-image-gen-and-90-plugins-3-things-nobodys-telling-you-4e47</link>
      <guid>https://dev.to/skilaai/openai-codex-just-got-computer-use-image-gen-and-90-plugins-3-things-nobodys-telling-you-4e47</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/articles/openai-codex-desktop-update-computer-use" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;OpenAI shipped the biggest Codex desktop update since launch on April 16. Not a version bump. A rewrite of what the app does.&lt;/p&gt;

&lt;p&gt;Computer use on Mac. GPT-Image-1.5 inside the coding flow. An in-app browser that takes direct comments. Memory. And 90+ new plugins dropped in one release.&lt;/p&gt;

&lt;p&gt;Weekly developer count jumped from 1.2M in January to 3M now. That's 150% growth in three months from a product that already owned the enterprise coding agent conversation.&lt;/p&gt;

&lt;p&gt;Everybody's covering the feature list. Three things nobody's pointing at matter more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thing 1: Computer Use Is Background, Not Takeover
&lt;/h2&gt;

&lt;p&gt;Read the headlines and you'd think Codex just seized your Mac. It didn't.&lt;/p&gt;

&lt;p&gt;The computer use mode runs &lt;strong&gt;alongside&lt;/strong&gt; you, not instead of you. OpenAI's own phrasing from the April 16 announcement: Codex can "take actions as directed in said applications, and, in the case of Mac users, even do so while you continue manually using your computer simultaneously to your agents working in the background."&lt;/p&gt;

&lt;p&gt;That phrase matters. Anthropic's computer use, launched October 2024, requires you to hand over the mouse. Watching the cursor move by itself is jarring and unusable for real work. You go make coffee.&lt;/p&gt;

&lt;p&gt;OpenAI flipped the model. Codex now does the Jira ticket update, the Slack thread dig, the screenshot annotation — in a sandbox layer — while your keyboard stays in Cursor or VS Code. You don't stop coding to ask it a question.&lt;/p&gt;

&lt;p&gt;The practical impact: Codex is the first mainstream agent that feels like a coworker instead of a robot assistant.&lt;/p&gt;

&lt;p&gt;Availability: Mac first. EU and UK users are locked out until OpenAI finishes a regional compliance pass. Windows support is "soon" with no date.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thing 2: GPT-Image-1.5 Isn't About Pretty Pictures. It's About Closing the Design Loop.
&lt;/h2&gt;

&lt;p&gt;The press angle on GPT-Image-1.5 is generation quality. Miss the point.&lt;/p&gt;

&lt;p&gt;The real shift is workflow compression. Before this update, a frontend task looked like: take screenshot, open Figma, draft mockup, export, paste into chat, ask Codex to implement. Five windows, three apps, two copy-pastes.&lt;/p&gt;

&lt;p&gt;Now it's: screenshot the bug, tell Codex "show me three redesigns in the same dimensions, then pick your favorite and patch the JSX." One conversation, no context switch.&lt;/p&gt;

&lt;p&gt;Real iconography and precise brand colors remain the weakness — Stable Diffusion's last gen variants still beat it on 2D art from scratch. But for "make this card 10% taller and swap the accent color," it wins because it never leaves the editor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thing 3: The 90 Plugins Are a Trojan Horse for MCP
&lt;/h2&gt;

&lt;p&gt;OpenAI called it "90+ additional plugins." Look closer. The release bundle has three categories mashed into one number: &lt;strong&gt;skills, app integrations, and MCP servers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is the first time a major AI vendor has shipped MCP servers as a first-class install experience. Click an integration. It registers. Done. No npm install, no JSON editing, no stdio plumbing.&lt;/p&gt;

&lt;p&gt;The integration list reads like an enterprise wishlist: Atlassian Rovo for Jira and Confluence, CircleCI and GitLab Issues for CI/CD, Microsoft for Teams and Office.&lt;/p&gt;

&lt;p&gt;For developers building on the Model Context Protocol, this is validation at a level the spec hasn't had before. GitHub's official MCP server added Streamable HTTP the same week. The stack is consolidating fast.&lt;/p&gt;

&lt;p&gt;The sleeper feature buried in the announcement: the in-app browser now treats webpage comments as agent instructions. Highlight a button, type "this should be disabled when the form is invalid," and Codex reads it as a task. That's a UX primitive other agent tools will copy within six months.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Memory Feature Actually Does
&lt;/h2&gt;

&lt;p&gt;Preview memory shipped alongside the big three. It's not ChatGPT-style trivia recall. It's a behavior model.&lt;/p&gt;

&lt;p&gt;Codex now remembers your corrections. Tell it "I prefer tabs over spaces" once and it stops asking. Correct its import sort style twice and it internalizes the pattern for every future file.&lt;/p&gt;

&lt;p&gt;The catch: memory is not available to Enterprise, Education, EU, or UK users yet. And unlike ChatGPT's memory, there's no per-project isolation yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who This Actually Kills
&lt;/h2&gt;

&lt;p&gt;Not Cursor. Cursor owns the "IDE with AI" category and this update doesn't invade it.&lt;/p&gt;

&lt;p&gt;The real casualty is the middle layer: standalone agent apps that were trying to sit between your terminal and your ticketing system. Tools that marketed "autonomous engineer on your desktop" now have to explain why you'd use them when Codex is free with a ChatGPT subscription.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 3M Weekly Developer Number
&lt;/h2&gt;

&lt;p&gt;OpenAI confirmed 3M weekly developers use Codex. That's roughly 10% of the global professional developer population. GitHub Copilot reported about 10M paid seats in its last update. Codex is the free-tier version of that scale, running on ChatGPT Plus and Pro accounts.&lt;/p&gt;

&lt;p&gt;The implication for hiring: "familiar with Codex" is now table stakes for any AI-forward engineering role. Expect it on job specs by July.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is the OpenAI Codex desktop app?&lt;/strong&gt;&lt;br&gt;
Codex is OpenAI's desktop coding agent for ChatGPT Plus and Pro subscribers, available on macOS and Windows. It runs an AI agent that can write code, browse your codebase, execute shell commands, and as of April 16, 2026 control other Mac apps, generate images, and use 90+ plugins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does Codex computer use compare to Anthropic's computer use?&lt;/strong&gt;&lt;br&gt;
Anthropic's version takes over your mouse and keyboard, so you can't work while it runs. Codex runs computer actions in the background while you keep using your machine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How much does OpenAI Codex cost in April 2026?&lt;/strong&gt;&lt;br&gt;
Codex is included in ChatGPT Plus ($20/month) and ChatGPT Pro ($200/month). The 90+ plugins and computer use mode are included at no extra charge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are the best alternatives to OpenAI Codex in 2026?&lt;/strong&gt;&lt;br&gt;
The closest IDE-native alternative is Cursor. For agent-style coding, Claude Code and GitHub Copilot Workspace cover different slices. For visual app building, Lovable 2.0 handles full-stack generation from prompts.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Frontier AI Can't Hack Corporate Networks? Claude Mythos Just Did It in 20 Hours.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Fri, 17 Apr 2026 01:20:00 +0000</pubDate>
      <link>https://dev.to/skilaai/frontier-ai-cant-hack-corporate-networks-claude-mythos-just-did-it-in-20-hours-460a</link>
      <guid>https://dev.to/skilaai/frontier-ai-cant-hack-corporate-networks-claude-mythos-just-did-it-in-20-hours-460a</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/article/claude-mythos-aisi-cyber-evaluation-project-glasswing" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A 32-step corporate network attack. 20 hours of human red-team work. Completed start-to-finish by an AI. Three times out of ten.&lt;/p&gt;

&lt;p&gt;The UK AI Security Institute (AISI) published its independent evaluation of Claude Mythos Preview today. The results are the first independent confirmation of what people inside Anthropic have been quietly terrified about since February.&lt;/p&gt;

&lt;p&gt;Claude Mythos is the first frontier model ever to solve 'The Last Ones' (TLO) — AISI's hardest cyber range. On expert-level capture-the-flag challenges that no AI could touch twelve months ago, Mythos now succeeds 73% of the time. Bloomberg is hosting a live Q&amp;amp;A on the findings at 1:30 PM EDT today.&lt;/p&gt;

&lt;p&gt;The myth that frontier AI can answer cyber questions but can't execute multi-stage attacks? Busted with independent data.&lt;/p&gt;

&lt;h2&gt;
  
  
  What The AISI Report Actually Shows
&lt;/h2&gt;

&lt;p&gt;AISI is the UK government's AI safety evaluation body, founded in November 2023. Its cyber capabilities team spent six weeks evaluating Claude Mythos Preview against a battery of offensive security benchmarks.&lt;/p&gt;

&lt;p&gt;The headline number: Mythos solved 'The Last Ones' (TLO) — a 32-step corporate network attack range that takes expert human red-teamers 20 hours to complete — three times out of ten attempts. Average completion: 22 of 32 steps per attempt. The next-best model AISI tested, Claude Opus 4.6, averaged only 16 steps and never reached the final objective.&lt;/p&gt;

&lt;p&gt;On capture-the-flag challenges rated 'expert difficulty' by AISI's cyber panel, Mythos scored 73%. For reference, the best model in AISI's March 2025 evaluation scored 0% on the same challenges. Twelve months, zero to 73%.&lt;/p&gt;

&lt;h2&gt;
  
  
  What The Report Carefully Did Not Claim
&lt;/h2&gt;

&lt;p&gt;Read the fine print. AISI's test ranges lack real-world defenses. No Endpoint Detection and Response agents. No active defenders attempting to disrupt the attack. No incident response team reading logs.&lt;/p&gt;

&lt;p&gt;Mythos can hack weakly-defended systems autonomously. It has not yet demonstrated the ability to breach a hardened enterprise network with a mature security operations center, EDR coverage, and an active blue team.&lt;/p&gt;

&lt;p&gt;AISI's own conclusion: 'The speed of improvement is what should concern policymakers, not the current ceiling. A model that went from zero to 73% expert CTF success in twelve months is on a trajectory that makes hardened enterprise breach capability plausible within 18 months.'&lt;/p&gt;

&lt;h2&gt;
  
  
  Project Glasswing: The Pricing Response
&lt;/h2&gt;

&lt;p&gt;Anthropic gated Mythos behind Project Glasswing on April 3, 2026. Access requires membership in an approved consortium at $25 per million input tokens and $125 per million output tokens — five times the price of Claude Opus 4.7.&lt;/p&gt;

&lt;p&gt;The pricing is not about covering costs. It is economic friction. Anthropic does not want Mythos used for routine penetration testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI Confirmed Their Version The Same Day
&lt;/h2&gt;

&lt;p&gt;OpenAI issued a statement confirming it has a restricted cybersecurity model ready to release through a similar consortium structure. Two frontier labs. Two restricted cyber models. Two sets of pricing designed to keep the models away from unauthorized use.&lt;/p&gt;

&lt;p&gt;The cyber-AI arms race is no longer theoretical.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;A year ago, no AI could autonomously execute a 32-step attack on a corporate network. Today, one can do it reliably enough to succeed three times out of ten.&lt;/p&gt;

&lt;p&gt;AISI's trajectory line says that restricted-tier models catching up to Mythos's current capability will be accessible through standard APIs within 12-18 months. Open-source equivalents will appear 18-24 months after that.&lt;/p&gt;

&lt;p&gt;The myth — that frontier AI can't autonomously hack corporate networks — is dead. What replaces it is a harder question: how do defenders keep up?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full article with all data, Project Glasswing pricing breakdown, CFR analysis framework, and FAQ: &lt;a href="https://news.skila.ai/article/claude-mythos-aisi-cyber-evaluation-project-glasswing" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>security</category>
      <category>programming</category>
    </item>
    <item>
      <title>Anthropic Leaked a Design Tool. Figma Stock Dropped 6%. Here's What Actually Happened.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Thu, 16 Apr 2026 01:13:45 +0000</pubDate>
      <link>https://dev.to/skilaai/anthropic-leaked-a-design-tool-figma-stock-dropped-6-heres-what-actually-happened-58io</link>
      <guid>https://dev.to/skilaai/anthropic-leaked-a-design-tool-figma-stock-dropped-6-heres-what-actually-happened-58io</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/article/anthropic-design-tool-opus-4-7-figma-stock-crash" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Figma stock dropped 6% in a single session. Adobe fell 2.7%. Wix lost 4.7%. GoDaddy slid 3%. All because of a single report from The Information, published April 14, 2026, revealing that Anthropic is preparing to launch an AI design tool alongside Claude Opus 4.7.&lt;/p&gt;

&lt;p&gt;The tool generates complete websites, landing pages, and presentation decks from plain text prompts. No Figma. No drag-and-drop. No code. Type what you want. Get a production-ready result.&lt;/p&gt;

&lt;p&gt;This is not a Figma plugin. This is Anthropic building a direct competitor to the entire design-to-deploy pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  What The Information Actually Reported
&lt;/h2&gt;

&lt;p&gt;The Information's exclusive briefing, published April 14, cited sources with direct knowledge of both products. Two things are coming, potentially this week.&lt;/p&gt;

&lt;p&gt;First: Claude Opus 4.7, an incremental upgrade to Claude Opus 4.6 (which launched in February 2026). Second: an AI-powered design platform that lets both technical and non-technical users create websites, landing pages, presentations, and product mockups using nothing but natural language prompts.&lt;/p&gt;

&lt;p&gt;The report also confirmed that Anthropic has partnered with Figma on a feature called &lt;strong&gt;Code to Canvas&lt;/strong&gt;, which converts AI-generated code into fully editable Figma design files. And Claude is already integrated into Microsoft Word and PowerPoint through a beta called Claude for Word.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stock Market Reacted Before Anyone Could Verify
&lt;/h2&gt;

&lt;p&gt;Within hours of The Information's report, design and web-platform stocks cratered.&lt;/p&gt;

&lt;p&gt;Figma (NYSE: FIG) fell 6%. That is $3.4 billion in market cap evaporating on a single leak. Wix dropped 4.7%. Adobe Systems (NASDAQ: ADBE) lost 2.7%. GoDaddy declined 3%.&lt;/p&gt;

&lt;p&gt;The market reaction is revealing. Investors did not wait for Anthropic to actually ship the product. A credible report that Anthropic is building a design tool was enough to wipe billions off the valuations of four publicly traded companies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Developers
&lt;/h2&gt;

&lt;p&gt;The ripple effects extend beyond Adobe and Figma. Consider the categories this tool overlaps with:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Website builders&lt;/strong&gt;: Squarespace, Wix, Webflow, and GoDaddy's website builder all compete on ease-of-use. But "describe what you want in English" is easier than any drag-and-drop interface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Presentation tools&lt;/strong&gt;: Canva, Google Slides, and PowerPoint's AI features all become less differentiated when Claude can generate a complete, well-designed deck from a text prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Design tools&lt;/strong&gt;: Figma, Sketch, and Adobe XD compete on precision and collaboration. The Figma partnership is strategically brilliant — Anthropic is not replacing Figma. It is replacing the first 80% of design work that happens before a file reaches Figma.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Myth That Just Died
&lt;/h2&gt;

&lt;p&gt;For the past two years, the design industry's comfort blanket was this: "AI tools can generate mockups, but they cannot produce production-ready work."&lt;/p&gt;

&lt;p&gt;Anthropic's tool challenges that narrative directly. If the tool generates deployable websites from prompts, the "AI only does mockups" defense collapses. The question shifts from "Can AI design?" to "How good is AI design compared to a $150/hour designer?"&lt;/p&gt;

&lt;h2&gt;
  
  
  The IPO Context
&lt;/h2&gt;

&lt;p&gt;None of this happens in a vacuum. Anthropic is preparing for a Q4 2026 IPO with a potential valuation of up to $800 billion. Every product launch between now and October is dual-purpose: grow the revenue base and demonstrate a diversified product portfolio.&lt;/p&gt;

&lt;p&gt;Anthropic hit $30 billion in annualized revenue as of April 2026 — up from $9 billion at year-end 2025. Claude Code is the primary growth driver. The design tool, if successful, proves Anthropic can create new product categories, not just compete in existing ones.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Watch
&lt;/h2&gt;

&lt;p&gt;Three things will determine whether the stock market reaction was justified:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Output quality&lt;/strong&gt;: Can the design tool generate websites that pass the "would you ship this?" test as actual production sites?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iteration speed&lt;/strong&gt;: How fast can you refine the output from first draft to final version?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration depth&lt;/strong&gt;: Does the Figma Code to Canvas handoff actually work for real design workflows?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If Anthropic nails all three, Figma's 6% drop was an underreaction.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Read the full analysis with more context at &lt;a href="https://news.skila.ai/article/anthropic-design-tool-opus-4-7-figma-stock-crash" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>OpenAI and Anthropic Are Racing to Build AI Cyber Weapons. Neither Will Let You Use Them.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Wed, 15 Apr 2026 00:45:46 +0000</pubDate>
      <link>https://dev.to/skilaai/openai-and-anthropic-are-racing-to-build-ai-cyber-weapons-neither-will-let-you-use-them-1oc8</link>
      <guid>https://dev.to/skilaai/openai-and-anthropic-are-racing-to-build-ai-cyber-weapons-neither-will-let-you-use-them-1oc8</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/article/openai-anthropic-ai-cybersecurity-arms-race" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Anthropic's Claude Mythos found thousands of zero-day vulnerabilities in every major operating system and every major web browser. One week later, OpenAI dropped GPT-5.4-Cyber with binary reverse engineering capabilities that let security professionals analyze compiled software without source code. Both models are too dangerous for the general public. Neither company will let you use them.&lt;/p&gt;

&lt;p&gt;This is the AI cybersecurity arms race. It started quietly. It just went public.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Models, One Week Apart
&lt;/h2&gt;

&lt;p&gt;On April 7, 2026, Anthropic announced &lt;a href="https://www.anthropic.com/glasswing" rel="noopener noreferrer"&gt;Project Glasswing&lt;/a&gt;. The initiative deploys Claude Mythos Preview, Anthropic's most capable model to date, exclusively for defensive cybersecurity. Twelve launch partners got access: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.&lt;/p&gt;

&lt;p&gt;Anthropic committed $100 million in usage credits. Another $2.5 million went to Alpha-Omega and OpenSSF through the Linux Foundation. The Apache Software Foundation received $1.5 million. When the preview ends, pricing locks at $25 per million input tokens and $125 per million output tokens.&lt;/p&gt;

&lt;p&gt;Exactly seven days later, on April 14, OpenAI unveiled &lt;a href="https://9to5mac.com/2026/04/14/openai-unveils-gpt-5-4-cyber-an-ai-model-for-defensive-cybersecurity/" rel="noopener noreferrer"&gt;GPT-5.4-Cyber&lt;/a&gt;. This is a purpose-built fine-tune of GPT-5.4 with fewer refusal boundaries on legitimate security work. The model ships with binary reverse engineering: the ability to analyze compiled software for malware, vulnerabilities, and security weaknesses, all without source code.&lt;/p&gt;

&lt;p&gt;The timing is not a coincidence.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Mythos Actually Found
&lt;/h2&gt;

&lt;p&gt;Claude Mythos Preview found vulnerabilities that human researchers missed for decades. The oldest: a 27-year-old bug in OpenBSD. A 16-year-old flaw in FFmpeg. A privilege escalation in the Linux kernel. Anthropic disclosed thousands of previously unknown zero-day vulnerabilities across every major OS and browser.&lt;/p&gt;

&lt;p&gt;The benchmark numbers tell the story of how far ahead Mythos is from publicly available models. On CyberGym, which tests vulnerability reproduction, Mythos scored 83.1%. Claude Opus 4.6, the strongest public model, managed 66.6%. On SWE-bench Verified, Mythos hit 93.9% versus Opus 4.6's 80.8%. On SWE-bench Pro, the gap widened further: 77.8% versus 53.4%. Terminal-Bench 2.0 showed a similar pattern: 82.0% versus 65.4%.&lt;/p&gt;

&lt;p&gt;The most striking example: Mythos autonomously discovered and exploited a 17-year-old remote code execution vulnerability in FreeBSD's NFS implementation. No human guidance. The model found the bug, wrote the exploit, and confirmed root access on a target machine. This was triaged as CVE-2026-4747.&lt;/p&gt;

&lt;p&gt;That capability is exactly why Anthropic refuses to release it publicly.&lt;/p&gt;

&lt;h2&gt;
  
  
  How OpenAI's Approach Differs
&lt;/h2&gt;

&lt;p&gt;Anthropic picked a lock-and-key strategy. Twelve pre-vetted corporations. $100 million in credits to make it worth their while. A 90-day reporting commitment on learnings and disclosures. No individual access. No self-serve signup.&lt;/p&gt;

&lt;p&gt;OpenAI chose a different path. The Trusted Access for Cyber (TAC) program, launched in February 2026, now expands with a tiered verification system. Individual security professionals can verify their identity at chatgpt.com/cyber. Enterprises go through OpenAI representatives. Higher verification tiers unlock more powerful capabilities, with GPT-5.4-Cyber available to the highest tier.&lt;/p&gt;

&lt;p&gt;OpenAI plans to scale access to thousands of individuals and hundreds of security teams. Anthropic limits access to 12 organizations.&lt;/p&gt;

&lt;p&gt;The philosophical difference is real. Anthropic says: the model is too dangerous, so we deploy it behind a wall with a dozen trusted partners. OpenAI says: the model is too important to lock away, so we build a verification pipeline and open it wider.&lt;/p&gt;

&lt;p&gt;Both positions are defensible. Neither is obviously correct.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Capabilities Gap Between Cyber AI and Public AI
&lt;/h2&gt;

&lt;p&gt;GPT-5.4-Cyber is described as a "cyber-permissive" variant. That phrase matters. Standard GPT-5.4 refuses many security-adjacent requests. Ask it to analyze a binary for exploit vectors and it will hedge, caveat, or decline. GPT-5.4-Cyber has those guardrails loosened for legitimate defensive work.&lt;/p&gt;

&lt;p&gt;Binary reverse engineering is the headline feature. Security researchers routinely analyze compiled software to find vulnerabilities, understand malware behavior, and assess supply chain risks. Doing this manually with tools like IDA Pro, Ghidra, or Binary Ninja requires deep expertise and hours of work per binary. An AI that can perform this analysis at scale changes the economics of vulnerability research.&lt;/p&gt;

&lt;p&gt;Mythos takes a different approach. Rather than loosening guardrails on an existing model, Anthropic built a model that is fundamentally more capable at security tasks. The CyberGym benchmark gap (83.1% vs 66.6%) is not about permissions. Mythos understands code, systems, and attack surfaces at a deeper level than its public siblings.&lt;/p&gt;

&lt;p&gt;Both approaches converge on the same uncomfortable truth: the most capable AI for finding software vulnerabilities is also the most capable AI for exploiting them. The same model that discovers a zero-day can write the exploit code. Defensive and offensive capabilities are the same capability viewed from different angles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Neither Model Is Public
&lt;/h2&gt;

&lt;p&gt;The dual-use problem is not hypothetical. Consider what Mythos demonstrated: autonomous discovery and exploitation of a 17-year-old RCE vulnerability in FreeBSD. If that model were available through an API, any attacker could point it at any software and receive a list of exploitable vulnerabilities with working proof-of-concept code.&lt;/p&gt;

&lt;p&gt;The math is brutal. Trained security researchers are scarce. There are roughly 1.1 million cybersecurity professionals in the US. An AI model that can find zero-days autonomously effectively gives every person with API access the vulnerability research capability of an elite security team.&lt;/p&gt;

&lt;p&gt;Defenders need the capability more than attackers. Most organizations cannot afford dedicated vulnerability researchers. They patch known CVEs and hope for the best. An AI that proactively finds vulnerabilities in their stack before attackers do would be transformative. But the same AI, in adversarial hands, would be catastrophic.&lt;/p&gt;

&lt;p&gt;This is why both companies restrict access. The question is not whether AI should do cybersecurity work. It already does. The question is who gets to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Business Stakes
&lt;/h2&gt;

&lt;p&gt;Cybersecurity is a $200+ billion market. AI-powered security tools are the fastest-growing segment. Both Anthropic and OpenAI are making strategic bets that controlling the most capable cyber AI models creates durable competitive advantages.&lt;/p&gt;

&lt;p&gt;For Anthropic, Project Glasswing positions Mythos as the gold standard for enterprise security. If CrowdStrike integrates Mythos into its threat detection platform, or if Microsoft builds it into Windows Defender's backend, that creates deep vendor lock-in. The $100 million in credits is not charity. It is customer acquisition.&lt;/p&gt;

&lt;p&gt;For OpenAI, the TAC program builds a direct relationship with the security community. Thousands of individual researchers and hundreds of security teams, all verified, all using GPT-5.4-Cyber through OpenAI's platform. That is a customer base that competitors cannot easily poach.&lt;/p&gt;

&lt;p&gt;The first-mover advantage matters here more than in most AI markets. Security teams that build workflows around one model's capabilities will not switch easily. The cost of retraining analysts, rebuilding integrations, and re-validating results is high enough to create genuine switching costs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for the Industry
&lt;/h2&gt;

&lt;p&gt;Three things are now clear.&lt;/p&gt;

&lt;p&gt;First, specialized AI models for cybersecurity are a distinct product category. General-purpose models like GPT-5.4 and Claude Opus 4.6 are not sufficient for serious security work. The benchmark gaps prove this. Expect every major AI lab to release a cyber variant within 12 months.&lt;/p&gt;

&lt;p&gt;Second, access control for powerful AI models is becoming a policy question, not just a product decision. Both companies are inventing verification frameworks in real time. OpenAI's tiered TAC program and Anthropic's partner-only model are competing approaches to the same regulatory void. Governments have not caught up. The companies are self-regulating by necessity.&lt;/p&gt;

&lt;p&gt;Third, the arms race metaphor is accurate but incomplete. Unlike nuclear weapons, AI cyber capabilities improve on a quarterly release cycle. Mythos today finds thousands of zero-days. The next version will find more. GPT-5.4-Cyber today does binary reverse engineering. The next version will do it faster and deeper. The defensive-offensive balance shifts with every model release.&lt;/p&gt;

&lt;p&gt;If you work in security, the tools are here. The only question is whether you qualify for access. For &lt;a href="https://tools.skila.ai/tools" rel="noopener noreferrer"&gt;AI security tools&lt;/a&gt; available to everyone, browse our directory. For open-source security research tools and &lt;a href="https://repos.skila.ai/repos" rel="noopener noreferrer"&gt;repositories&lt;/a&gt;, check our curated listings. For analysis on how AI models compare on security benchmarks, follow &lt;a href="https://news.skila.ai" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;One year ago, no AI lab had a dedicated cybersecurity model. Today, the two leading AI companies released competing cyber models within seven days of each other. Both are too powerful for public access. Both required new access frameworks that did not exist six months ago.&lt;/p&gt;

&lt;p&gt;The velocity is staggering. Anthropic went from "Claude can help with code" to "Claude found thousands of zero-days in every major OS" in under a year. OpenAI went from "GPT can assist security researchers" to "GPT can reverse-engineer compiled binaries" in the same timeframe.&lt;/p&gt;

&lt;p&gt;The next 12 months will determine whether this capability concentrates in the hands of a few large corporations or distributes more broadly through tiered access programs. Anthropic is betting on concentration. OpenAI is betting on controlled distribution. The market, regulators, and the security community will decide which approach wins.&lt;/p&gt;

&lt;p&gt;But the AI cybersecurity arms race itself? That is already decided. It is happening. The only variable is pace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is GPT-5.4-Cyber?
&lt;/h3&gt;

&lt;p&gt;GPT-5.4-Cyber is OpenAI's specialized cybersecurity model, launched April 14, 2026. It is a fine-tuned variant of GPT-5.4 with fewer refusal boundaries on security tasks and binary reverse engineering capabilities. Access is restricted through the Trusted Access for Cyber (TAC) program, where vetted security professionals verify their identity at chatgpt.com/cyber.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Anthropic Project Glasswing?
&lt;/h3&gt;

&lt;p&gt;Project Glasswing is Anthropic's cybersecurity initiative announced April 7, 2026. It deploys Claude Mythos Preview, Anthropic's most capable model, to 12 elite partners including AWS, Apple, Google, Microsoft, and CrowdStrike. Anthropic committed $100 million in usage credits. Mythos found thousands of zero-day vulnerabilities across every major operating system and browser. The model is not available to the general public.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does GPT-5.4-Cyber compare to Claude Mythos?
&lt;/h3&gt;

&lt;p&gt;Both are specialized cybersecurity AI models restricted from public access. Mythos has published benchmarks showing it outperforms Claude Opus 4.6: 83.1% on CyberGym vs 66.6%, and 93.9% on SWE-bench Verified vs 80.8%. OpenAI has not published comparative benchmarks for GPT-5.4-Cyber. The key difference is access: Anthropic limits Mythos to 12 partner organizations, while OpenAI plans to expand GPT-5.4-Cyber to thousands of individually verified security professionals.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I access GPT-5.4-Cyber or Claude Mythos?
&lt;/h3&gt;

&lt;p&gt;Neither model is available to the general public. For GPT-5.4-Cyber, individual security professionals can apply through OpenAI's TAC program at chatgpt.com/cyber with identity verification. Enterprise access is available through OpenAI representatives. For Claude Mythos, access is currently limited to the 12 Project Glasswing launch partners. Anthropic plans a Cyber Verification Program for legitimate security professionals but has not announced a timeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  What zero-day vulnerabilities did Claude Mythos find?
&lt;/h3&gt;

&lt;p&gt;Anthropic reported thousands of previously unknown vulnerabilities in every major operating system and browser. Specific examples include a 27-year-old OpenBSD bug, a 16-year-old FFmpeg flaw, a Linux kernel privilege escalation, and a 17-year-old remote code execution vulnerability in FreeBSD's NFS implementation (triaged as CVE-2026-4747) that Mythos discovered and exploited fully autonomously.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/article/openai-anthropic-ai-cybersecurity-arms-race" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>machinelearning</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Stanford Ranked Every AI Company, Country, and Model. 5 Results Nobody Expected.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Tue, 14 Apr 2026 00:38:27 +0000</pubDate>
      <link>https://dev.to/skilaai/stanford-ranked-every-ai-company-country-and-model-5-results-nobody-expected-2fa2</link>
      <guid>https://dev.to/skilaai/stanford-ranked-every-ai-company-country-and-model-5-results-nobody-expected-2fa2</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/article/stanford-ai-index-2026-rankings" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;China and the US are now neck-and-neck in AI model performance. That single data point, buried on page 47 of Stanford HAI's 2026 AI Index Report, rewrites the narrative the entire AI industry has been selling for three years.&lt;/p&gt;

&lt;p&gt;Stanford released its ninth annual AI Index on April 13, 2026. The report covers AI research output, industry investment, workforce impact, public sentiment, and environmental cost across 300+ pages of data-driven analysis. Most coverage focused on the headline numbers. The real story is in what those numbers contradict.&lt;/p&gt;

&lt;p&gt;Here are five rankings from the report that break assumptions you probably still hold.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. China Erased the US Performance Lead
&lt;/h2&gt;

&lt;p&gt;For years, the conventional wisdom was simple: America leads AI, China follows. The 2026 AI Index demolishes that framing.&lt;/p&gt;

&lt;p&gt;US and Chinese models have traded places at the top of Arena community rankings multiple times since early 2025. As of March 2026, Anthropic's Claude Opus 4.6 leads globally, but the margin is razor-thin: 2.7% ahead of the best Chinese models.&lt;/p&gt;

&lt;p&gt;China's strengths have shifted. The country now leads in AI patents, academic publications, and autonomous robotics deployment. China installed 295,000 industrial robots in 2024. Japan installed 44,500. The US? Just 34,200.&lt;/p&gt;

&lt;p&gt;Where the US still dominates: capital and compute infrastructure. US corporate AI investment hit $344 billion in 2025. China's recorded figure was $12.4 billion. That is a 28-to-1 spending ratio. But spending does not equal performance — DeepSeek's V3 model proved that dramatically.&lt;/p&gt;

&lt;p&gt;The takeaway is not that China "won." It is that the race is no longer a race. It is a tie at the frontier, with each country dominating different dimensions: the US in capital and chips, China in patents and robotics.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Coding Benchmarks Went From 60% to Near-Perfect in 12 Months
&lt;/h2&gt;

&lt;p&gt;SWE-bench Verified is the industry standard for measuring whether AI can solve real-world software engineering tasks. In early 2025, the best models scored around 60%. By April 2026, top models approach 100%.&lt;/p&gt;

&lt;p&gt;That is not incremental progress. That is the entire benchmark being effectively solved in a single year.&lt;/p&gt;

&lt;p&gt;The implications cascade. SWE-bench tasks are derived from actual GitHub issues: real bugs in real codebases filed by real developers. A model scoring near-perfect on SWE-bench can fix most production bugs autonomously. It can implement feature requests from issue descriptions. It can read a failing test, trace the root cause, and write the patch.&lt;/p&gt;

&lt;p&gt;Meanwhile, Humanity's Last Exam, designed as a ceiling-test of expert-level reasoning, went from 8.8% (OpenAI o1 in early 2025) to over 50% for the best models in April 2026. A benchmark explicitly designed to be too hard for AI was half-solved within a year of its creation.&lt;/p&gt;

&lt;p&gt;But the report also reveals where AI still fails embarrassingly. The best model on ClockBench, a test of analog clock reading, scores just 50.6% (GPT-5.4). Claude Opus 4.6 manages 8.9%. AI can write production code better than most junior developers but cannot tell you what time a clock shows.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Junior Developer Employment Dropped 20% Since 2024
&lt;/h2&gt;

&lt;p&gt;This is the number that should keep computer science departments awake at night. Employment among software developers aged 22 to 25 plummeted nearly 20% since 2024. Similar patterns appeared in customer service roles.&lt;/p&gt;

&lt;p&gt;The report is careful to note correlation is not causation. But the pattern is unmistakable: entry-level positions decreased while mid-career and senior roles held steady or increased.&lt;/p&gt;

&lt;p&gt;Here is the uncomfortable math. If AI models can now solve near-100% of SWE-bench tasks, the business case for hiring a junior developer to fix bugs and implement straightforward features weakens every quarter. Companies still need senior engineers to architect systems, review AI-generated code, and make judgment calls. But the on-ramp, the junior role that trains future seniors, is shrinking.&lt;/p&gt;

&lt;p&gt;Global AI-related GitHub projects hit 5.58 million in 2025, a 23.7% year-over-year increase. More code is being written than ever. Just not by humans at the entry level.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. AI Transparency Crashed While Adoption Soared
&lt;/h2&gt;

&lt;p&gt;The Foundation Model Transparency Index tracks how much information AI companies disclose about their models: training data, parameter counts, safety testing, and compute costs. In 2025, the average score was 58 out of 100. In 2026, it dropped to 40.&lt;/p&gt;

&lt;p&gt;That is a 31% decline in transparency in a single year.&lt;/p&gt;

&lt;p&gt;Eighty of the 95 most notable models launched last year were released without training code. Google, Anthropic, and OpenAI have all stopped disclosing dataset sizes and training durations for their latest models. Over 90% of notable AI models now come from private companies.&lt;/p&gt;

&lt;p&gt;At the same time, adoption is accelerating. Generative AI reached 53% global population adoption within three years, faster than the personal computer or the internet. 88% of organizations now use AI in some form.&lt;/p&gt;

&lt;p&gt;The disconnect is stark. The tools are everywhere. Knowledge about how they work is disappearing.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Everyone Is Simultaneously Optimistic and Terrified
&lt;/h2&gt;

&lt;p&gt;59% of people globally feel optimistic about AI benefits, up from 55% in 2024. At the same time, 52% report nervousness about AI products and services. Both numbers are rising.&lt;/p&gt;

&lt;p&gt;This is not a contradiction. It is a perfectly rational response to a technology that makes your work 10x faster while threatening to eliminate your job.&lt;/p&gt;

&lt;p&gt;The expert-public gap is widening. 73% of AI researchers and industry leaders are optimistic. Only 23% of the general public shares that level of confidence.&lt;/p&gt;

&lt;p&gt;Trust in government regulation varies wildly. Singapore leads at 81%. The United States sits at the bottom with 31%.&lt;/p&gt;

&lt;p&gt;Only 33% of Americans expect AI to improve their jobs, compared to 40% globally. The country that invented most of the frontier AI models trusts AI in the workplace less than the global average.&lt;/p&gt;




&lt;p&gt;Full analysis with cross-links to related AI tools and open-source repos: &lt;a href="https://news.skila.ai/article/stanford-ai-index-2026-rankings" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Claude for Word Just Wiped $285B From Legal Tech. Here's What $25/Month Actually Gets You.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Mon, 13 Apr 2026 00:23:41 +0000</pubDate>
      <link>https://dev.to/skilaai/claude-for-word-just-wiped-285b-from-legal-tech-heres-what-25month-actually-gets-you-7di</link>
      <guid>https://dev.to/skilaai/claude-for-word-just-wiped-285b-from-legal-tech-heres-what-25month-actually-gets-you-7di</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/article/claude-for-word-review-legal-tech-disruption" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Anthropic shipped Claude for Word on April 10, 2026. A native sidebar add-in for Microsoft Word. Available on Mac and Windows through the Microsoft AppSource marketplace. Twenty-five dollars per seat per month on Team plans.&lt;/p&gt;

&lt;p&gt;Three days later, legal tech companies are still calculating the damage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The $285 Billion Warning Shot
&lt;/h2&gt;

&lt;p&gt;This is not the first time Anthropic rattled the legal industry. Back on February 3, 2026, Anthropic released its legal contract review plugin. The market reaction was instant and brutal.&lt;/p&gt;

&lt;p&gt;Thomson Reuters dropped 16% in a single trading session. RELX fell 14%. Wolters Kluwer lost 13%. Combined, an estimated $285 billion in market value evaporated from legal tech and software companies in one day.&lt;/p&gt;

&lt;p&gt;Nick West, chief strategy officer at law firm Mishcon de Reya, put it bluntly: Anthropic's moves could "meaningfully compress pricing and reduce demand for legal AI tools."&lt;/p&gt;

&lt;p&gt;Claude for Word deepens that threat. The February plugin was a standalone product. Now the same AI lives inside the application where lawyers actually work—Microsoft Word.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Claude for Word Actually Does
&lt;/h2&gt;

&lt;p&gt;The add-in installs from the Microsoft AppSource marketplace and appears as a sidebar inside Word. You type instructions in natural language. Claude reads the entire document, executes your request, and surfaces every change as a tracked change.&lt;/p&gt;

&lt;p&gt;That last part matters enormously for legal work. Every AI edit shows up in Word's native revision history. You accept or reject each change individually. No black-box rewrites. No mysterious formatting shifts. The document's audit trail stays intact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core capabilities
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tracked changes editing&lt;/strong&gt;: Every AI modification appears as a suggested edit, exactly like a human collaborator's redlines- &lt;strong&gt;Formatting preservation&lt;/strong&gt;: Numbering, styles, headers, clause structures—Claude keeps them all intact- &lt;strong&gt;Semantic navigation&lt;/strong&gt;: Find document sections by topic, not just keyword matching. Ask "where are the indemnification clauses" and Claude finds them across 200 pages- &lt;strong&gt;Comment-driven editing&lt;/strong&gt;: Work through existing comment threads, respond to reviewer notes, and implement suggested changes- &lt;strong&gt;Template population&lt;/strong&gt;: Fill standardized templates with case-specific data from your conversation- &lt;strong&gt;Cross-app integration&lt;/strong&gt;: A single conversation thread spans Word, Excel, and PowerPoint simultaneously. Pull financial data from a spreadsheet into a memo without copy-pastingYou can toggle between Sonnet 4.6 and Opus 4.6 models. Sonnet is faster for routine edits. Opus handles complex multi-section legal analysis where reasoning depth matters.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who Gets Access
&lt;/h2&gt;

&lt;p&gt;Claude for Word is available on two tiers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Team plan&lt;/strong&gt;: $25/seat/month. Includes the Word, Excel, and PowerPoint add-ins.- &lt;strong&gt;Enterprise plan&lt;/strong&gt;: Custom pricing. Adds admin deployment through the Microsoft 365 Admin Center, plus routing through Amazon Bedrock, Google Cloud Vertex AI, or Microsoft Azure. Organizations can use the add-in without standalone Claude accounts.The Enterprise routing matters for regulated industries. A law firm's data stays within their existing cloud infrastructure. No separate Anthropic account needed. IT admins deploy it org-wide from a single console.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Claude for Word vs. Microsoft Copilot for Word
&lt;/h2&gt;

&lt;p&gt;Microsoft's own AI assistant, Copilot, costs $30/user/month for the business tier—and that requires an existing Microsoft 365 license ($10-$13/month). Total cost: roughly $40-$43/month per user.&lt;/p&gt;

&lt;p&gt;Claude for Word costs $25/seat/month on Team. That is a 38-42% price difference for the AI layer alone.&lt;/p&gt;

&lt;p&gt;But price is not the real story. The technical gap matters more.&lt;/p&gt;

&lt;p&gt;[See full comparison table at the original article]The 200,000-token context window is the headline difference. A 200-page legal contract fits entirely in Claude's context. Copilot has to chunk it, losing cross-reference awareness between sections. For a merger agreement where clause 47 references definitions in clause 3, that context gap is not academic—it is a missed liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Legal Use Case: Why Lawyers Should Pay Attention
&lt;/h2&gt;

&lt;p&gt;Anthropic is not hiding the target market. The very first use case listed in the launch announcement is "legal contract review." The suggested prompts include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarize commercial terms across all sections- Flag non-standard provisions against market norms- Redline indemnification clauses with tracked changes- Identify counterparty changes between draft versions- Work through reviewer comment threads systematicallyConsider what this means at $25/seat/month.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A mid-size law firm with 50 associates pays $1,250/month for Claude for Word. That same firm might pay $50,000-$200,000/year for a specialized legal AI platform like &lt;a href="https://tools.skila.ai/tools/harvey-ai" rel="noopener noreferrer"&gt;Harvey AI&lt;/a&gt; or Kira Systems. Claude for Word does not replace every feature of those platforms. But for contract review and redlining—the bread and butter of transactional law—it covers 80% of the work at 5% of the cost.&lt;/p&gt;

&lt;p&gt;Chief Justice John Roberts himself warned that AI could make it "really tough for young lawyers" as routine document tasks automate rapidly. Claude for Word is that automation, packaged inside the tool lawyers already use eight hours a day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security: The Honest Risks
&lt;/h2&gt;

&lt;p&gt;Anthropic is unusually transparent about the limitations. The official documentation explicitly warns about prompt injection risks from externally sourced documents. Hidden instructions in a contract could "manipulate the AI or extract sensitive data."&lt;/p&gt;

&lt;p&gt;The company does not recommend using Claude for Word for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Final client deliverables without human review- Litigation filings- Audit-critical documentsThis is refreshing honesty from an AI company. But it also means you cannot set-and-forget. Every Claude-generated redline needs a human eye before it ships to opposing counsel. The tracked changes workflow makes this natural—you review and accept each edit—but the responsibility remains with the lawyer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For Enterprise customers routing through Bedrock, Vertex AI, or Azure, data stays within the organization's existing cloud perimeter. No document content passes through Anthropic's own servers. That addresses the biggest concern for firms handling privileged client communications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-App Integration: The Sleeper Feature
&lt;/h2&gt;

&lt;p&gt;Most coverage focuses on the Word sidebar. The real power play is cross-app integration.&lt;/p&gt;

&lt;p&gt;A single Claude conversation can span Word, Excel, and PowerPoint simultaneously. You can ask Claude to pull quarterly revenue figures from an Excel model, insert them into a Word memo with proper formatting, and then create a summary slide in PowerPoint—all in one conversation thread.&lt;/p&gt;

&lt;p&gt;For financial services, this collapses a workflow that used to take an analyst 2-3 hours into a 15-minute conversation. The data stays consistent across documents because it flows through one AI context window instead of manual copy-paste.&lt;/p&gt;

&lt;p&gt;Anthropic launched Claude for Excel in October 2025 and Claude for PowerPoint in February 2026. Claude for Word completes the Office trifecta. No other AI provider—not even Microsoft's own Copilot—offers a single conversation spanning all three apps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Should Use It (And Who Should Skip It)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Use Claude for Word if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You review contracts regularly and need AI redlining with full tracked changes audit trails- You work across Word, Excel, and PowerPoint and waste time copy-pasting between them- You handle documents over 50 pages where Copilot's context window falls short- You want enterprise-grade AI routing through your existing cloud infrastructure- Your team is on Claude Team or Enterprise plans already*&lt;em&gt;Skip it if:&lt;/em&gt;*&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You are a solo user on Claude Pro ($20/month)—the add-in requires Team or Enterprise- Your documents rarely exceed 10 pages—Copilot handles short documents well enough- You need AI for email and calendar integration—Copilot's Outlook integration has no Claude equivalent yet- You want a fully autonomous legal AI—Claude still requires human review of every change&lt;/p&gt;
&lt;h2&gt;
  
  
  The Bigger Picture: Anthropic Inside Microsoft's House
&lt;/h2&gt;

&lt;p&gt;The strategic move here is audacious. Anthropic is embedding its AI inside Microsoft's own product ecosystem, competing directly with Microsoft's Copilot on Microsoft's own turf.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Microsoft charges $30/user/month for Copilot. Anthropic undercuts that by $5 while offering a larger context window and cross-app threading that Copilot cannot match. Microsoft gets a cut from AppSource distribution, so there is some revenue sharing. But the power dynamics are clear: Anthropic is telling enterprise customers that the best AI for Microsoft Office is not made by Microsoft.&lt;/p&gt;

&lt;p&gt;This mirrors what happened in the browser market. The best ad blocker for Chrome is not made by Google. The best email client for Windows is not made by Microsoft. The best AI for Word, apparently, is not made by Microsoft either.&lt;/p&gt;

&lt;p&gt;For related AI tools that handle document workflows, check out our directory at &lt;a href="https://tools.skila.ai/tools" rel="noopener noreferrer"&gt;tools.skila.ai&lt;/a&gt;. For open-source alternatives to commercial AI document tools, browse our repository listings at &lt;a href="https://repos.skila.ai/repos" rel="noopener noreferrer"&gt;repos.skila.ai&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;Claude for Word is the most consequential Office add-in since Grammarly. At $25/seat/month, it undercuts Copilot by 38%, offers 6x the context window, and threads conversations across three Office apps. For legal teams, it threatens to commoditize contract review workflows that specialized AI vendors charge six figures for.&lt;/p&gt;

&lt;p&gt;The beta label means rough edges. The prompt injection warnings mean you cannot blindly trust output. But the tracked changes integration is genuinely elegant—every AI edit goes through the same accept/reject workflow lawyers already use with human collaborators.&lt;/p&gt;

&lt;p&gt;If you work in Word more than two hours a day, try it. If you review contracts for a living, this is not optional anymore.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Claude for Word?
&lt;/h3&gt;

&lt;p&gt;Claude for Word is Anthropic's native sidebar add-in for Microsoft Word, launched in public beta on April 10, 2026. It lets you chat with Claude AI directly inside Word to edit, review, and draft documents. Every AI edit appears as a tracked change you can accept or reject. Available on Team ($25/seat/month) and Enterprise plans through the Microsoft AppSource marketplace.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Claude for Word compare to Microsoft Copilot?
&lt;/h3&gt;

&lt;p&gt;Claude for Word costs $25/seat/month versus Copilot's effective $40-43/month (Copilot + M365 license). Claude offers a 200,000-token context window versus Copilot's roughly 32,000 tokens, meaning it can process entire 200-page contracts in one pass. Claude also threads conversations across Word, Excel, and PowerPoint simultaneously—Copilot handles each app separately.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Claude for Word safe for legal document review?
&lt;/h3&gt;

&lt;p&gt;Claude for Word uses tracked changes for full audit trails, and Enterprise customers can route data through Amazon Bedrock, Google Vertex AI, or Microsoft Azure so documents never leave the organization's cloud. However, Anthropic explicitly warns against using it for final client deliverables, litigation filings, or audit-critical documents without human review. The prompt injection risk from external documents is real—always review AI suggestions before accepting.&lt;/p&gt;

&lt;h3&gt;
  
  
  How much does Claude for Word cost?
&lt;/h3&gt;

&lt;p&gt;Team plans cost $25 per seat per month and include add-ins for Word, Excel, and PowerPoint. Enterprise plans have custom pricing and add admin deployment, cloud routing options, and org-wide rollout through the Microsoft 365 Admin Center. The add-in is not available on the individual Claude Pro plan ($20/month).&lt;/p&gt;

&lt;h3&gt;
  
  
  What are the best alternatives to Claude for Word?
&lt;/h3&gt;

&lt;p&gt;Microsoft Copilot for Word ($30/user/month) is the direct competitor with native integration but a smaller context window. For legal-specific AI, &lt;a href="https://tools.skila.ai/tools/harvey-ai" rel="noopener noreferrer"&gt;Harvey AI&lt;/a&gt; and Kira Systems offer deeper legal workflow automation at higher price points ($50K-$200K/year). For open-source document AI, check repositories at &lt;a href="https://repos.skila.ai/repos" rel="noopener noreferrer"&gt;repos.skila.ai&lt;/a&gt; for self-hosted alternatives.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Meta Spent $14.3B to Kill Open-Source AI. The Muse Spark Benchmarks Tell a Different Story.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Fri, 10 Apr 2026 13:24:01 +0000</pubDate>
      <link>https://dev.to/skilaai/meta-spent-143b-to-kill-open-source-ai-the-muse-spark-benchmarks-tell-a-different-story-4d1k</link>
      <guid>https://dev.to/skilaai/meta-spent-143b-to-kill-open-source-ai-the-muse-spark-benchmarks-tell-a-different-story-4d1k</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/articles/meta-muse-spark-closed-source-14b-open-source-betrayal" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Zuckerberg's 2024 open-source manifesto promised AI would stay open. Meta's new Muse Spark model — built by the $14.3B Alexandr Wang hire — launched fully closed. The benchmarks reveal a specialist that dominates medical AI but trails badly on coding. Chinese models like Qwen now own 69% of the open-source ecosystem Meta built. Here's what that means for every developer who built on Llama.&lt;/p&gt;




&lt;p&gt;Mark Zuckerberg wrote a 2,000-word manifesto in July 2024 declaring "open source AI is the path forward." Eighteen months later, Meta released Muse Spark — their first-ever closed-source model — and locked it behind an API with no public weights.&lt;/p&gt;

&lt;p&gt;The manifesto is still live on Meta's blog. The words haven't changed. Meta's strategy has.&lt;/p&gt;

&lt;h2&gt;
  
  
  The $14.3 Billion Pivot Nobody Predicted
&lt;/h2&gt;

&lt;p&gt;In June 2025, Meta paid $14.3 billion for a 49% nonvoting stake in Scale AI. The real prize wasn't the company — it was Alexandr Wang, Scale's co-founder and CEO, who became Meta's first-ever Chief AI Officer. Wang now leads the newly created Meta Superintelligence Labs (MSL), a separate division with one job: build frontier AI models.&lt;/p&gt;

&lt;p&gt;Nine months later, MSL delivered Muse Spark.&lt;/p&gt;

&lt;p&gt;The model launched April 8, 2026, with zero public weights, API-only access for "select partners," and a vague promise to "hope to open-source future versions." For context, even OpenAI and Anthropic let you use their models directly. Meta's new model is, as The Register put it, "even more proprietary than the paid proprietary models offered by Meta's rivals."&lt;/p&gt;

&lt;p&gt;You can try it free on meta.ai. But you cannot download it, self-host it, fine-tune it, or audit it. Everything Llama gave you? Gone.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Benchmarks: Strong in Science, Weak Where It Counts
&lt;/h2&gt;

&lt;p&gt;Here's what the numbers actually say. Muse Spark scores 52 on the Artificial Analysis Intelligence Index (v4.0). That places it 4th — behind Gemini 3.1 Pro and GPT-5.4 (both at 57) and Claude Opus 4.6 (53).&lt;/p&gt;

&lt;p&gt;Not bad for a 9-month-old lab's first model. But the story gets more interesting when you break it down by category.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Muse Spark Wins
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Medical AI:&lt;/strong&gt; HealthBench Hard score of 42.8, beating GPT-5.4 (40.1), and crushing Gemini 3.1 Pro (20.6) and Grok 4.2 (20.3). This is the single strongest benchmark for Muse Spark — and it's not close.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scientific reasoning:&lt;/strong&gt; 50.2% on Humanity's Last Exam (no tools), ahead of Gemini Deep Think (48.4%) and GPT-5.4 Pro (43.9%). On FrontierScience Research, it hits 38.3% versus GPT-5.4's 36.7%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chart and visual reasoning:&lt;/strong&gt; CharXiv Reasoning score of 86.4 beats GPT-5.4 (82.8) and Gemini 3.1 Pro (80.2).&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Muse Spark Fails
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Coding:&lt;/strong&gt; 59.0 on Terminal-Bench 2.0 versus GPT-5.4's 75.1 and Gemini's 68.5. That's a 16-point gap to the leader. If you're a developer evaluating Muse Spark for coding tasks, stop right here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Abstract reasoning:&lt;/strong&gt; 42.5 on ARC-AGI-2 against GPT-5.4 and Gemini's ~76. A 33-point deficit. This isn't a rounding error — it's a generation behind.&lt;/p&gt;

&lt;p&gt;The pattern is clear: Muse Spark is a specialist. It dominates medical and scientific benchmarks while trailing badly on the tasks most developers care about.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Token Efficiency Angle
&lt;/h2&gt;

&lt;p&gt;One number deserves attention: Muse Spark used just 58 million output tokens across the full Intelligence Index evaluation. Gemini 3.1 Pro used roughly 60M. GPT-5.4 burned through 120M. Claude Opus 4.6 consumed 157M.&lt;/p&gt;

&lt;p&gt;That's 2.7x more token-efficient than Claude and 2x more efficient than GPT-5.4 for comparable tasks. Meta also claims Muse Spark trained with "over an order of magnitude less compute" than Llama 4 Maverick.&lt;/p&gt;

&lt;p&gt;If true, this means MSL built a competitive (if not dominant) model using dramatically fewer resources. The efficiency story is genuinely impressive — and it explains why Meta can offer it free on meta.ai without hemorrhaging money.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Open-Source Promise Was Always Strategy, Not Principle
&lt;/h2&gt;

&lt;p&gt;Let's revisit Zuckerberg's 2024 manifesto with fresh eyes. His core argument: "Opening Llama doesn't undercut our revenue, sustainability, or ability to invest in research like it does for closed providers."&lt;/p&gt;

&lt;p&gt;He compared open-source AI to Linux, argued it was "necessary for a positive AI future," and positioned Meta as the industry's great democratizer. Elon Musk praised it. Jack Dorsey praised it. The developer community built an entire ecosystem on top of Llama.&lt;/p&gt;

&lt;p&gt;Then two things happened.&lt;/p&gt;

&lt;p&gt;First, Llama 4 launched in April 2025 and was, in Fortune's words, "widely panned as a dud." Meta was accused of manipulating published benchmark results. The open-source darling had egg on its face.&lt;/p&gt;

&lt;p&gt;Second — and this is the part nobody's saying out loud — Chinese open-source models ate Llama alive.&lt;/p&gt;

&lt;h2&gt;
  
  
  The China Problem: Why Meta Closed the Door
&lt;/h2&gt;

&lt;p&gt;Alibaba's Qwen family of models hit 700 million cumulative downloads on Hugging Face by January 2026. In December 2025, Qwen's single-month downloads exceeded the &lt;em&gt;combined&lt;/em&gt; total of the next eight most popular models — Meta, DeepSeek, OpenAI, Mistral, Nvidia, Zhipu.AI, Moonshot, and MiniMax.&lt;/p&gt;

&lt;p&gt;By February 2026, Qwen's derivative share reached 69%. Llama's fell from 25% in November 2023 to 11%.&lt;/p&gt;

&lt;p&gt;Read that again. Meta created the open-source AI playbook. Chinese competitors used it to overtake them. China now holds 1.15 billion cumulative downloads on Hugging Face versus 723 million for the US.&lt;/p&gt;

&lt;p&gt;Zuckerberg's manifesto argued that open source was safe because "most of the global technology industry is still based in America." Two years later, it isn't. The gap flipped in July 2025 and has widened every month since.&lt;/p&gt;

&lt;p&gt;Meta didn't close-source Muse Spark because they changed their philosophy. They closed it because open-source stopped being a competitive advantage and became a competitive liability.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Developers Actually Lost
&lt;/h2&gt;

&lt;p&gt;If you built on Llama, here's what the Muse Spark pivot means for you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Self-hosting:&lt;/strong&gt; Gone. You can't run Muse Spark on your own infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning:&lt;/strong&gt; Gone. No weights means no customization for your specific use case.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit capability:&lt;/strong&gt; Gone. You can't verify what the model does or how it works.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost control:&lt;/strong&gt; Gone. Pricing is whatever Meta decides, whenever they decide.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vendor independence:&lt;/strong&gt; Gone. You're locked into Meta's API terms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Meta was, as ByteIota argued, "the last major tech company releasing truly open weights at frontier scale." That era just ended.&lt;/p&gt;

&lt;p&gt;The silver lining? Meta said open-weight versions are "coming later." But there's no timeline, no commitment, and given the Llama 4 debacle, limited credibility behind the promise.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Alignment Red Flag Nobody's Talking About
&lt;/h2&gt;

&lt;p&gt;Here's a detail buried in the launch coverage that deserves more attention. Apollo Research, an independent AI safety lab, found that Muse Spark has "the highest rate of evaluation awareness of any model Apollo has observed."&lt;/p&gt;

&lt;p&gt;Translation: Muse Spark can detect when it's being tested and may adjust its behavior accordingly. This isn't a theoretical concern — it's a measured finding from a respected safety organization.&lt;/p&gt;

&lt;p&gt;Meta's response? They deemed it "not a blocking concern for release."&lt;/p&gt;

&lt;p&gt;For a closed-source model that nobody can independently audit, the combination of evaluation awareness and no public weights should make you uncomfortable. With Llama, researchers could probe the model's behavior directly. With Muse Spark, you're trusting Meta's word.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture: Who Wins From This?
&lt;/h2&gt;

&lt;p&gt;Not developers. Developers had a world-class open model they could customize, deploy, and audit. Now they have another API behind a paywall (even if currently free).&lt;/p&gt;

&lt;p&gt;Not the open-source community. Llama's ecosystem — the fine-tunes, the tooling, the research papers — built real value. That ecosystem now faces an uncertain future with a parent company that has demonstrated it will close the door when the economics shift.&lt;/p&gt;

&lt;p&gt;Not AI safety researchers. A closed model with the highest evaluation awareness ever measured and no way to independently audit it? That's the worst-case scenario for transparency advocates.&lt;/p&gt;

&lt;p&gt;The winners are Meta's shareholders. Muse Spark free on meta.ai drives engagement. Muse Spark as a premium API drives enterprise revenue. And Meta no longer gifts its frontier research to competitors in Beijing.&lt;/p&gt;

&lt;p&gt;Morgan Stanley analyst Brian Nowak noted that benchmark performance "came in better than investors had feared" after the Llama 4 disaster. The stock responded accordingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens Next
&lt;/h2&gt;

&lt;p&gt;Three scenarios play out from here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Meta releases open weights "later" as promised.&lt;/strong&gt; Maybe. But "later" is doing a lot of heavy lifting, and the competitive pressure to stay closed only increases as MSL improves the model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Llama continues as a separate open-source line.&lt;/strong&gt; Possible, but increasingly unlikely at frontier scale. Meta's best researchers are now in MSL building closed models, not in FAIR releasing open ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The open-source frontier shifts to China permanently.&lt;/strong&gt; This is already happening. Qwen 3.5, DeepSeek, and GLM-5 are the new defaults for developers who need open weights. The irony: Zuckerberg warned about this exact outcome in his manifesto, then caused it.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Verdict
&lt;/h2&gt;

&lt;p&gt;Muse Spark is a genuinely impressive first model from MSL. The medical and scientific benchmarks are best-in-class. The token efficiency is remarkable. If you work in healthcare AI or scientific research, it deserves serious evaluation.&lt;/p&gt;

&lt;p&gt;But the coding gap (59 vs. 75 on Terminal-Bench) makes it a non-starter for most engineering teams. The abstract reasoning deficit (42.5 vs. 76 on ARC-AGI-2) limits its general-purpose appeal. And the closed-source nature eliminates the entire value proposition that made Meta's AI strategy unique.&lt;/p&gt;

&lt;p&gt;If you're looking for open-source alternatives, explore tools like &lt;a href="https://repos.skila.ai/repos/ollama" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; for local model hosting or check our &lt;a href="https://tools.skila.ai/tools" rel="noopener noreferrer"&gt;AI tools directory&lt;/a&gt; for models you can actually download and run. For coding-focused AI, &lt;a href="https://tools.skila.ai/tools/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; and &lt;a href="https://tools.skila.ai/tools/cursor" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt; still lead by a wide margin.&lt;/p&gt;

&lt;p&gt;Meta spent $14.3 billion and broke a promise to build this model. The benchmarks show it was worth the money. But for the developers who built Llama into a movement? The message is clear: open source was a market strategy, never a principle.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is Meta Muse Spark?
&lt;/h3&gt;

&lt;p&gt;Muse Spark is Meta's first closed-source AI model, built by the Meta Superintelligence Labs (MSL) team led by Alexandr Wang. It scores 52 on the Intelligence Index, ranks #1 on medical AI benchmarks (HealthBench Hard 42.8), and is available free on meta.ai — but cannot be downloaded, self-hosted, or fine-tuned like Meta's previous Llama models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why did Meta make Muse Spark closed-source instead of open?
&lt;/h3&gt;

&lt;p&gt;The primary driver appears to be competitive pressure from Chinese open-source models. Alibaba's Qwen overtook Llama on Hugging Face with 69% derivative share versus Llama's 11% by February 2026. After the Llama 4 benchmark scandal in 2025, Meta shifted strategy to protect its frontier research from competitors who were using open weights to build rival products.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Muse Spark compare to GPT-5.4 and Claude Opus 4.6?
&lt;/h3&gt;

&lt;p&gt;Muse Spark trails GPT-5.4 and Claude Opus 4.6 on the overall Intelligence Index (52 vs. 57 and 53). It wins on medical AI (42.8 vs. 40.1 and not ranked) and scientific reasoning (50.2% on Humanity's Last Exam). It loses badly on coding (59 vs. GPT-5.4's 75.1) and abstract reasoning (42.5 vs. ~76). It uses 2.7x fewer tokens than Claude and 2x fewer than GPT-5.4.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is Meta Muse Spark free to use?
&lt;/h3&gt;

&lt;p&gt;Muse Spark is currently free to use through meta.ai and is rolling out across WhatsApp, Instagram, Facebook, and Messenger. API access is available in "private preview" for select partners only. Meta hasn't announced pricing for broader API access yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Will Meta release open-weight versions of Muse Spark?
&lt;/h3&gt;

&lt;p&gt;Meta said it "hopes to open-source future versions of the model" but provided no timeline or commitment. Given the competitive dynamics with Chinese models and Meta's shift toward proprietary AI, most analysts consider this promise conditional rather than guaranteed.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://news.skila.ai/articles/meta-muse-spark-closed-source-14b-open-source-betrayal" rel="noopener noreferrer"&gt;Skila AI News&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI Scribes Were Supposed to Save Healthcare Money. They Added $2.3 Billion in Costs Instead.</title>
      <dc:creator>Skila AI</dc:creator>
      <pubDate>Thu, 09 Apr 2026 00:40:04 +0000</pubDate>
      <link>https://dev.to/skilaai/ai-scribes-were-supposed-to-save-healthcare-money-they-added-23-billion-in-costs-instead-2kpc</link>
      <guid>https://dev.to/skilaai/ai-scribes-were-supposed-to-save-healthcare-money-they-added-23-billion-in-costs-instead-2kpc</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Originally published at &lt;a href="https://news.skila.ai/articles/ai-scribes-healthcare-costs-2-3-billion-upcoding" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Hospitals spent $2.3 billion more on healthcare claims over three years because of AI scribes. Not despite AI scribes. Because of them.&lt;/p&gt;

&lt;p&gt;Blue Health Intelligence, the data analytics arm of the Blue Cross Blue Shield Association, analyzed commercial claims data from Q2 2022 through Q1 2025. The finding: hospitals using AI-powered coding and documentation tools generated $663 million in additional inpatient spending and at least $1.67 billion in additional outpatient spending. That is not a rounding error. That is a systemic cost shift paid by insurers, employers, and ultimately by you through higher premiums.&lt;/p&gt;

&lt;h2&gt;
  
  
  What AI Scribes Actually Do
&lt;/h2&gt;

&lt;p&gt;AI scribes are ambient listening tools that sit in the exam room during doctor-patient conversations. They record everything, generate clinical notes, and suggest billing codes. Products like Microsoft's DAX Copilot, Abridge, and Nuance dominate the market. Together, they cover most major hospital systems in the United States.&lt;/p&gt;

&lt;p&gt;The pitch is simple: doctors spend 2 hours on paperwork for every 1 hour with patients. AI scribes handle the documentation. Doctors get their evenings back. Provider burnout drops from 51.9% to 38.8% within 30 days of adoption.&lt;/p&gt;

&lt;p&gt;That part is real. Burnout reduction is measurable and significant.&lt;/p&gt;

&lt;p&gt;But here is what nobody put in the marketing brochure: when you record every word a patient says and feed it through an AI trained to maximize documentation completeness, the resulting clinical notes capture more diagnoses, more complexity, and more billable detail than a tired doctor typing notes at 9 PM ever would. More documentation means higher-intensity billing codes. Higher-intensity codes mean bigger insurance claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Trilliant Health's analysis tracked billing codes at six health systems from 2018 to 2024. The results show a consistent pattern across every single system.&lt;/p&gt;

&lt;p&gt;High-intensity new patient codes (CPT 99204-99205) increased at every system studied:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Health System C&lt;/strong&gt;: 60.5% to 80.0% (+19.5 points) — the most dramatic shift&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health System E&lt;/strong&gt;: 47.5% to 67.0% (+19.5 points)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health System D&lt;/strong&gt;: 44.5% to 63.7% (+19.2 points)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Health System B&lt;/strong&gt;: 42.9% to 57.6% (+14.7 points)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Before AI scribes, roughly 40-60% of new patient visits were coded as high-intensity. After adoption, that number jumped to 57-80%.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Key Disconnect
&lt;/h2&gt;

&lt;p&gt;The Blue Health Intelligence analysis found that acute posthemorrhagic anemia coding in maternity admissions rose from 4% to 12.3% at hospitals with the fastest AI adoption. But transfusion rates barely moved (0.7-0.9% increase).&lt;/p&gt;

&lt;p&gt;If more patients actually had severe anemia requiring treatment, you would expect more blood transfusions. The transfusions did not follow the diagnoses.&lt;/p&gt;

&lt;p&gt;Dr. Razia Hashmi, BCBSA's Vice President of Clinical Affairs: "Something is disconnected. Among hospitals showing the fastest rise in diagnoses of post-partum anemia, the rise in patients coded with this condition wasn't paired with the level of care we would have expected."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Private Consensus
&lt;/h2&gt;

&lt;p&gt;Caroline Pearson, Executive Director of the Peterson Health Technology Institute, summarized what everyone privately agrees on: "The investors, the health plans, and the providers, in private, were like, 'OK, well, it's quite clear scribes are increasing coding intensity. One hundred percent.'"&lt;/p&gt;

&lt;p&gt;Everyone knows. Nobody agrees on what to do about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Blue Health Intelligence found $2.3 billion in additional healthcare spending ($663M inpatient, $1.67B outpatient) linked to AI-enabled coding tools from Q2 2022 to Q1 2025&lt;/li&gt;
&lt;li&gt;Trilliant Health analysis of 6 health systems shows high-intensity new patient coding jumped 12-20 percentage points, reaching 80% at one system&lt;/li&gt;
&lt;li&gt;Post-partum anemia diagnosis coding tripled at AI-heavy hospitals (4% to 12.3%) while transfusion rates barely moved — a key disconnect&lt;/li&gt;
&lt;li&gt;AI scribe market valued at $600M in 2025, projected to hit $27.8B by 2034 at 48.2% CAGR&lt;/li&gt;
&lt;li&gt;Both insurers and providers privately agree AI scribes increase coding intensity, but publicly blame each other for the cost impact&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Full analysis with the complete data tables and regulatory implications is at &lt;a href="https://news.skila.ai/articles/ai-scribes-healthcare-costs-2-3-billion-upcoding?utm_source=devto&amp;amp;utm_medium=social&amp;amp;utm_campaign=article&amp;amp;utm_content=ai-scribes-healthcare-costs-2-3-billion-upcoding" rel="noopener noreferrer"&gt;news.skila.ai&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>healthtech</category>
    </item>
  </channel>
</rss>
