<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aamer Mihaysi</title>
    <description>The latest articles on DEV Community by Aamer Mihaysi (@o96a).</description>
    <link>https://dev.to/o96a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3788049%2F0328b800-a998-4432-bdf0-3308cad77288.jpeg</url>
      <title>DEV Community: Aamer Mihaysi</title>
      <link>https://dev.to/o96a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/o96a"/>
    <language>en</language>
    <item>
      <title>Embeddings Just Went Multimodal: What Sentence Transformers 5.4 Means for RAG</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:59:33 +0000</pubDate>
      <link>https://dev.to/o96a/embeddings-just-went-multimodal-what-sentence-transformers-54-means-for-rag-247e</link>
      <guid>https://dev.to/o96a/embeddings-just-went-multimodal-what-sentence-transformers-54-means-for-rag-247e</guid>
      <description>&lt;p&gt;The latest Sentence Transformers release quietly changes something fundamental. Version 5.4 adds native multimodal support—same API, same patterns, but now you can encode and compare text, images, audio, and video in a shared embedding space.&lt;/p&gt;

&lt;p&gt;This isn't a wrapper. It's a direct extension of the embedding workflow that most RAG pipelines already use.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shift
&lt;/h2&gt;

&lt;p&gt;Traditional embedding models convert text into fixed-size vectors. You encode a query, encode your documents, compute cosine similarity. Works great until someone wants to search for "that screenshot with the error message" or "the slide deck about Q3 projections."&lt;/p&gt;

&lt;p&gt;Multimodal embedding models solve this by mapping inputs from different modalities into the same embedding space. A text query and an image document now share a coordinate system. Same similarity functions. Same retrieval logic. Different modalities.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Actually New
&lt;/h2&gt;

&lt;p&gt;Sentence Transformers 5.4 adds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal encoding&lt;/strong&gt;: &lt;code&gt;model.encode()&lt;/code&gt; now handles images, audio, and video alongside text&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-modal reranking&lt;/strong&gt;: Score relevance between mixed-modality pairs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified API&lt;/strong&gt;: No new abstractions—load a model, encode inputs, compute similarity
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sentence_transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SentenceTransformer&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SentenceTransformer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Qwen/Qwen3-VL-Embedding-2B&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Encode different modalities
&lt;/span&gt;&lt;span class="n"&gt;text_emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;quarterly revenue report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;img_emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/screenshot.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Same similarity function you already use
&lt;/span&gt;&lt;span class="n"&gt;similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_emb&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;img_emb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The reranker models extend similarly—you can score pairs where one or both elements are images.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for RAG
&lt;/h2&gt;

&lt;p&gt;Most production RAG systems are text-only. When users ask about visual content, you either:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run OCR on everything (slow, lossy)&lt;/li&gt;
&lt;li&gt;Ignore it entirely (incomplete)&lt;/li&gt;
&lt;li&gt;Build a parallel image search system (complex, disconnected)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Multimodal embeddings collapse these into one pipeline. Your retrieval step can surface relevant images alongside text chunks without OCR preprocessing or separate indices.&lt;/p&gt;

&lt;p&gt;The reranking layer matters here too. Cross-encoder rerankers have been essential for text RAG because they score query-document pairs more accurately than embedding similarity alone. Multimodal rerankers extend that to visual documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hardware Reality
&lt;/h2&gt;

&lt;p&gt;There's a catch. VLM-based models like Qwen3-VL-2B need ~8GB VRAM. The 8B variants need ~20GB. CPU inference is "extremely slow" per the docs—CLIP and text-only models are better suited there.&lt;/p&gt;

&lt;p&gt;For production systems with GPU infrastructure, this is manageable. For edge deployments, you'll want smaller models or cloud inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Impact
&lt;/h2&gt;

&lt;p&gt;This changes what you can retrieve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visual document RAG&lt;/strong&gt;: Search PDFs with embedded charts, screenshots, and diagrams&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-modal search&lt;/strong&gt;: Find video clips from text descriptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal deduplication&lt;/strong&gt;: Identify near-duplicates across modalities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The API stays familiar. The infrastructure requirements shift. The use cases expand.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We're Still Missing
&lt;/h2&gt;

&lt;p&gt;The release handles encoding and reranking well, but production multimodal RAG needs more:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Index efficiency&lt;/strong&gt;: FAISS and similar indices weren't designed for mixed-modality queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunking strategies&lt;/strong&gt;: How do you chunk a video? What about image grids?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation frameworks&lt;/strong&gt;: BEIR and MTEB are text-only; multimodal benchmarks are sparse&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These will get solved. The embedding layer is now in place.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The gap between text RAG and multimodal RAG just got smaller. The question is whether your retrieval pipeline can handle what's now possible.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
      <category>embeddings</category>
    </item>
    <item>
      <title>Meta's Muse Spark Has 16 Tools and a Secret Weapon: Your Instagram Posts</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Thu, 09 Apr 2026 13:21:27 +0000</pubDate>
      <link>https://dev.to/o96a/metas-muse-spark-has-16-tools-and-a-secret-weapon-your-instagram-posts-37mg</link>
      <guid>https://dev.to/o96a/metas-muse-spark-has-16-tools-and-a-secret-weapon-your-instagram-posts-37mg</guid>
      <description>&lt;p&gt;Meta just shipped their first model since Llama 4. And the model itself might not be the story.&lt;/p&gt;

&lt;p&gt;Muse Spark launched today — competitive with Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 on benchmarks, but currently available only as a hosted service via private API preview. You can try it on meta.ai if you have a Facebook or Instagram login.&lt;/p&gt;

&lt;p&gt;Here's what's actually interesting: &lt;strong&gt;Meta didn't just release a model. They released a full agent platform with 16 tools baked in.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The tool stack they shipped
&lt;/h2&gt;

&lt;p&gt;Simon Willison poked around the meta.ai interface and extracted the complete tool catalog. It's worth reading in full, but the highlights:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;browser.search / browser.open / browser.find&lt;/strong&gt; — Web search and page analysis. Standard pattern now, but solid.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;meta_1p.content_search&lt;/strong&gt; — This is the sleeper. Semantic search across Instagram, Threads, and Facebook posts you have access to, filtered by author, celebrity mentions, comments, likes. Posts since January 2025 only. This turns your social graph into queryable context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;container.python_execution&lt;/strong&gt; — Full Code Interpreter with Python 3.9, pandas, numpy, matplotlib, scikit-learn, OpenCV, Pillow, PyMuPDF. Files persist at &lt;code&gt;/mnt/data/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;container.visual_grounding&lt;/strong&gt; — This is Segment Anything integrated directly into the chat. Give it an image path and object names, get back bounding boxes, point coordinates, or counts. Yes, it can literally count whiskers on a generated raccoon.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;container.create_web_artifact&lt;/strong&gt; — Generate HTML/JS artifacts or SVG graphics, rendered inline Claude Artifacts style.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;subagents.spawn_agent&lt;/strong&gt; — The sub-agent pattern. Spawn independent agents for research or delegation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;media.image_gen&lt;/strong&gt; — Image generation (likely Emu or an updated version) with artistic/realistic modes.&lt;/p&gt;

&lt;p&gt;And the rest: file editing tools, Meta content download, third-party account linking (Google Calendar, Outlook, Gmail).&lt;/p&gt;

&lt;h2&gt;
  
  
  Why the tools matter more than the model
&lt;/h2&gt;

&lt;p&gt;Everyone's benchmarking Muse Spark against Opus 4.6 and GPT-5.4. That's the wrong comparison.&lt;/p&gt;

&lt;p&gt;The real competition isn't model quality — it's &lt;strong&gt;platform lock-in&lt;/strong&gt;. Meta's tools pull from your Instagram posts, let you manipulate images you generated, run Python against them, and spawn sub-agents. That's not a chatbot. That's an operating system for multimodal workflows.&lt;/p&gt;

&lt;p&gt;Claude has Artifacts. ChatGPT has Code Interpreter and DALL-E. Gemini has Deep Think and Workspace integration. Meta's play here is clear: they're not competing on reasoning benchmarks. They're competing on what you can &lt;em&gt;do&lt;/em&gt; without leaving the interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  The efficiency claim
&lt;/h2&gt;

&lt;p&gt;One detail that stuck out: Meta says Muse Spark reaches Llama 4 Maverick's capabilities with &lt;strong&gt;an order of magnitude less compute&lt;/strong&gt;. If that's true and they open-source future versions, the laptop-model landscape shifts again.&lt;/p&gt;

&lt;p&gt;Alexandr Wang tweeted that open-source plans exist. After Llama 3.1/3.2/3.3 became the default for local inference, pulling back to hosted-only would be a strange move. The model ecosystem still needs a serious open contender at this tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm watching
&lt;/h2&gt;

&lt;p&gt;Two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;API pricing and rate limits&lt;/strong&gt; — The private preview tells us nothing. If Muse Spark launches at GPT-4-class pricing, it's a non-starter for most developers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Open weights timing&lt;/strong&gt; — If they ship Muse Spark (or a distilled variant) as downloadable weights, the local-first agent community gets a new default. If not, this is just Meta catching up to Anthropic's tool stack.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model's fine. The tools are the product. The question is whether Meta wants to own your agent workflow or just lease it to you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The 16-tool breakdown and visual grounding examples come from Simon Willison's excellent writeup on his blog.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>meta</category>
      <category>agents</category>
    </item>
    <item>
      <title>The Tool Harness Meta Didnt Tell You About</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Thu, 09 Apr 2026 11:19:58 +0000</pubDate>
      <link>https://dev.to/o96a/the-tool-harness-meta-didnt-tell-you-about-92h</link>
      <guid>https://dev.to/o96a/the-tool-harness-meta-didnt-tell-you-about-92h</guid>
      <description>&lt;p&gt;Meta just dropped Muse Spark, their first major model release in a year. The benchmarks show it competitive with Claude Opus 4.6 and GPT 5.4. But thats not the interesting part.&lt;/p&gt;

&lt;p&gt;Whats interesting is what Simon Willison discovered when he started poking around the meta.ai interface. He asked a simple question: what tools do you have access to?&lt;/p&gt;

&lt;p&gt;The answer revealed 16 tools. And Meta didnt hide them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tool Stack Nobody Mentioned
&lt;/h2&gt;

&lt;p&gt;Heres what Meta quietly shipped:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser tools.&lt;/strong&gt; &lt;code&gt;browser.search&lt;/code&gt;, &lt;code&gt;browser.open&lt;/code&gt;, &lt;code&gt;browser.find&lt;/code&gt;. Web search through an undisclosed engine, page loading, and pattern matching against content. Basic but essential.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Meta content search.&lt;/strong&gt; &lt;code&gt;meta_1p.content_search&lt;/code&gt; can search Instagram, Threads, and Facebook posts semantically—but only for content the user can access, created since 2025-01-01. Parameters include &lt;code&gt;author_ids&lt;/code&gt;, &lt;code&gt;key_celebrities&lt;/code&gt;, &lt;code&gt;commented_by_user_ids&lt;/code&gt;, &lt;code&gt;liked_by_user_ids&lt;/code&gt;. Thats a lot of filtering power.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code Interpreter.&lt;/strong&gt; &lt;code&gt;container.python_execution&lt;/code&gt; runs Python 3.9 in a sandbox with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV. Files persist at &lt;code&gt;/mnt/data/&lt;/code&gt;. Sound familiar? Its the same pattern ChatGPT and Claude use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Web artifacts.&lt;/strong&gt; &lt;code&gt;container.create_web_artifact&lt;/code&gt; creates HTML+JavaScript files that render as sandboxed iframes. Set &lt;code&gt;kind&lt;/code&gt; to &lt;code&gt;html&lt;/code&gt; for apps or &lt;code&gt;svg&lt;/code&gt; for graphics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visual grounding.&lt;/strong&gt; This one is fascinating. &lt;code&gt;container.visual_grounding&lt;/code&gt; analyzes images, identifies objects, and returns bounding boxes, points, or counts. Its Segment Anything as a tool—ask it to count whiskers on a raccoon and it outputs coordinates for each one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subagent spawning.&lt;/strong&gt; &lt;code&gt;subagents.spawn_agent&lt;/code&gt; delegates tasks to independent sub-agents. The pattern Simon documented months ago is now a built-in tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;The model itself is fine. Artificial Analysis scores it at 52, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Meta claims over an order of magnitude less compute than Llama 4 Maverick.&lt;/p&gt;

&lt;p&gt;But the real story is the convergence. Every major AI company is arriving at the same tool architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python execution sandbox&lt;/li&gt;
&lt;li&gt;Web artifact rendering
&lt;/li&gt;
&lt;li&gt;File manipulation primitives (view, insert, str_replace)&lt;/li&gt;
&lt;li&gt;Visual analysis grounded in the sandbox&lt;/li&gt;
&lt;li&gt;Subagent delegation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Metas implementation includes their own twist: tight integration with their social graph. Thats a moat. Thats data Claude and GPT cant access.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Open Weights Question
&lt;/h2&gt;

&lt;p&gt;Alexandr Wang hinted at open-sourcing future versions. Meta pioneered open weights with Llama. Then went closed with Llama 4. Now maybe back again?&lt;/p&gt;

&lt;p&gt;If they release Muse Spark weights, the tool harness becomes a reference implementation. Developers could replicate the meta.ai experience locally.&lt;/p&gt;

&lt;p&gt;But for now, its hosted only. Private API preview for select users. The tools work—but youre renting them, not owning them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;The model race gets attention. The tool race matters more.&lt;/p&gt;

&lt;p&gt;Metas 16-tool harness is sophisticated. Code Interpreter + visual grounding + subagent spawning + social graph search. Thats a productivity stack, not a chatbot.&lt;/p&gt;

&lt;p&gt;Claude has similar capabilities. GPT has similar capabilities. Gemini has similar capabilities. Were not comparing models anymore. Were comparing tool ecosystems.&lt;/p&gt;

&lt;p&gt;And the companies building the best tools—not just the smartest models—will win.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>meta</category>
      <category>agents</category>
      <category>tools</category>
    </item>
    <item>
      <title>When AI Models Expose Their Tools: The Transparency Pattern Changing Agent Development</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Thu, 09 Apr 2026 07:59:16 +0000</pubDate>
      <link>https://dev.to/o96a/when-ai-models-expose-their-tools-the-transparency-pattern-changing-agent-development-3f8g</link>
      <guid>https://dev.to/o96a/when-ai-models-expose-their-tools-the-transparency-pattern-changing-agent-development-3f8g</guid>
      <description>&lt;p&gt;Simon Willison pulled something interesting out of Meta's new Muse Spark model yesterday. By asking it directly, he got back the full list of 16 tools wired into Meta AI's chat harness. Not through documentation or API specs—just by asking the model itself.&lt;/p&gt;

&lt;p&gt;This matters more than it sounds. We're watching a shift in how AI platforms think about tool exposure, and it has implications for anyone building agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tool Transparency Pattern
&lt;/h2&gt;

&lt;p&gt;ChatGPT has Code Interpreter. Claude has Artifacts and tool use. Gemini has live search and file analysis. Each platform keeps its tool chain proprietary—you know the tools exist, but good luck getting the exact schemas.&lt;/p&gt;

&lt;p&gt;Muse Spark takes a different approach. The tools aren't hidden:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;container.python_execution&lt;/code&gt; — Python 3.9 sandbox with pandas, numpy, matplotlib, OpenCV&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container.visual_grounding&lt;/code&gt; — Segment Anything integration for object detection&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container.create_web_artifact&lt;/code&gt; — HTML/JS artifacts in sandboxed iframes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;subagents.spawn_agent&lt;/code&gt; — Delegate to sub-agents for research tasks&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;meta_1p.content_search&lt;/code&gt; — Search Instagram, Threads, Facebook posts you have access to&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;browser.search&lt;/code&gt;, &lt;code&gt;browser.open&lt;/code&gt;, &lt;code&gt;browser.find&lt;/code&gt; — Web browsing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And 10 more. Full parameter names, descriptions, and constraints. Available by just asking.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Different
&lt;/h2&gt;

&lt;p&gt;Most platforms treat tools as implementation details. You use them through the chat interface, but you don't get direct access to the tool schemas. Muse Spark's approach feels like a shift toward treating tools as first-class citizens.&lt;/p&gt;

&lt;p&gt;This matters for several reasons:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Debugging becomes possible.&lt;/strong&gt; When your agent does something unexpected, you can trace exactly which tool was called with what parameters. No more black-box debugging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Tool composition emerges.&lt;/strong&gt; Once you know the tools, you can think about combining them in ways the platform didn't anticipate. Visual grounding → code interpreter → artifact creation creates pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Portability increases.&lt;/strong&gt; If tools are documented and stable, you can build workflows that survive model updates. Your agent logic doesn't break when Claude 5 ships because it's built on tool patterns, not prompt engineering.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Ecosystem Effect
&lt;/h2&gt;

&lt;p&gt;Simon's exploration shows what's possible when tools are accessible. He generated a raccoon photo, then used visual_grounding to count whiskers, then analyzed the results with OpenCV—all within Meta's container.&lt;/p&gt;

&lt;p&gt;This is the composition pattern we've been waiting for. Not just "the model can use tools" but "the model can chain tools I didn't know existed."&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;subagents.spawn_agent&lt;/code&gt; tool is particularly interesting. It's the agent-as-tool pattern: spawn a research sub-agent, get back a final answer. This is how agents scale—not by getting smarter, but by delegating to specialized sub-components.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We're Still Missing
&lt;/h2&gt;

&lt;p&gt;Even with transparent tools, gaps remain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tool discovery&lt;/strong&gt;: You still have to ask or probe to learn what's available. There's no standard tool registry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Versioning&lt;/strong&gt;: What happens when &lt;code&gt;container.visual_grounding&lt;/code&gt; gains a &lt;code&gt;mask&lt;/code&gt; mode? Do old prompts break?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost visibility&lt;/strong&gt;: The tools are free to explore, but what's the token cost of spawning five sub-agents?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Pattern Converges
&lt;/h2&gt;

&lt;p&gt;We're seeing convergence across platforms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code execution containers (OpenAI, Anthropic, now Meta)&lt;/li&gt;
&lt;li&gt;Artifact rendering (Claude Artifacts, Meta's HTML/SVG)&lt;/li&gt;
&lt;li&gt;Visual analysis (Claude's vision + tools, Meta's visual_grounding)&lt;/li&gt;
&lt;li&gt;Sub-agent spawning (Meta explicit, Claude via tool use)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The platforms that expose their tool schemas openly are giving developers a head start on building robust agent workflows. The ones that don't are creating lock-in through opacity.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;Meta's approach—letting the model describe its own tools—might be the honest path forward. No marketing fluff, no hidden capabilities. Just "here's what I can do, here's how to invoke it."&lt;/p&gt;

&lt;p&gt;If you're building agents today, this is your cue. Design for tool discovery. Build workflows that can adapt when new tools appear. And maybe stop trying to reverse-engineer tool schemas through prompt injection—just ask the model. Some of them will tell you.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The convergence toward transparent tooling isn't just about openness. It's about composability. And the agents that win will be the ones that can compose tools they didn't know existed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Anthropic Just Did Something Unprecedented: They Hid Their Best Security Model</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Thu, 09 Apr 2026 07:22:33 +0000</pubDate>
      <link>https://dev.to/o96a/anthropic-just-did-something-unprecedented-they-hid-their-best-security-model-3ma5</link>
      <guid>https://dev.to/o96a/anthropic-just-did-something-unprecedented-they-hid-their-best-security-model-3ma5</guid>
      <description>&lt;p&gt;Today Anthropic announced Claude Mythos — a model so good at finding security vulnerabilities that they decided not to release it.&lt;/p&gt;

&lt;p&gt;Instead, they launched Project Glasswing: a restricted program that gives access only to vetted security researchers and major tech companies. The model has already found vulnerabilities in &lt;strong&gt;every major operating system and web browser&lt;/strong&gt;, including a 27-year-old bug in OpenBSD.&lt;/p&gt;

&lt;p&gt;This is not a marketing stunt.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes Mythos Different
&lt;/h2&gt;

&lt;p&gt;Claude Opus 4.6 had a near-0% success rate at autonomous exploit development. Mythos Preview developed working exploits &lt;strong&gt;181 times&lt;/strong&gt; out of hundreds of attempts on the same test cases.&lt;/p&gt;

&lt;p&gt;Nicholas Carlini from Anthropic said he found more bugs in the last couple weeks than in his entire career combined. Thats not hyperbole — the OpenBSD vulnerability they discovered has been in the codebase for 27 years.&lt;/p&gt;

&lt;p&gt;The model can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chain 4-5 vulnerabilities together into sophisticated exploit chains&lt;/li&gt;
&lt;li&gt;Write JIT heap sprays that escape browser AND OS sandboxes&lt;/li&gt;
&lt;li&gt;Find privilege escalation paths that humans missed for decades&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters for Developers
&lt;/h2&gt;

&lt;p&gt;This is a genuine inflection point. Security professionals are already drowning in AI-generated vulnerability reports:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Months ago, we were getting AI slop. Something happened a month ago, and the world switched. Now we have real reports.&lt;/em&gt; — Greg Kroah-Hartman, Linux kernel&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Im spending hours per day on this now. Its intense.&lt;/em&gt; — Daniel Stenberg, curl&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The codebases running the internets infrastructure — Linux, OpenBSD, browsers, servers — are being systematically audited by machines that dont get tired. The 27-year-old OpenBSD bug wasnt obscure — it was in TCP packet handling. Anyone could have found it. No one did. Until now.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-off
&lt;/h2&gt;

&lt;p&gt;Anthropic is putting $100M in compute credits and $4M in direct donations behind Project Glasswing. Partners include AWS, Apple, Microsoft, Google, and the Linux Foundation.&lt;/p&gt;

&lt;p&gt;But the model wont be generally available. If youre not a vetted partner, you wont get access.&lt;/p&gt;

&lt;p&gt;Is that the right call?&lt;/p&gt;

&lt;p&gt;I think so. The security community has been warning about this for months. The gap between AI can find bugs and AI can chain exploits autonomously just closed. Anthropic is giving infrastructure maintainers time to harden their systems before the capability proliferates.&lt;/p&gt;

&lt;p&gt;Because it will proliferate. Other labs will reach this threshold. The question isnt whether bad actors eventually get this — its whether critical systems get patched first.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;Two things are happening simultaneously:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Frontier models are becoming genuinely dangerous&lt;/strong&gt; — not in a sci-fi way, but in the boring sense that they can autonomously find and exploit vulnerabilities in production systems.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Responsible disclosure is scaling&lt;/strong&gt; — instead of one researcher finding one bug, were seeing systematic auditing of entire codebases that have been running the internet for decades.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The maintainers of OpenBSD, Linux, curl, Firefox — theyre all getting AI-generated reports now. Some are slop. Many are real.&lt;/p&gt;

&lt;p&gt;Project Glasswing is Anthropic acknowledging that releasing a model this capable, without guardrails, would be reckless. Its also them saying they expect safeguards to catch up — eventually.&lt;/p&gt;




&lt;p&gt;The model is called Mythos. The name fits. Its powerful, elusive, and only a few will get to use it.&lt;/p&gt;

&lt;p&gt;But the bugs its finding? Those are very real.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>anthropic</category>
      <category>llm</category>
    </item>
    <item>
      <title>Meta's New Model Has 16 Tools. Here's What They Do.</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Thu, 09 Apr 2026 01:21:26 +0000</pubDate>
      <link>https://dev.to/o96a/metas-new-model-has-16-tools-heres-what-they-do-42m6</link>
      <guid>https://dev.to/o96a/metas-new-model-has-16-tools-heres-what-they-do-42m6</guid>
      <description>&lt;h1&gt;
  
  
  Meta's New Model Has 16 Tools. Here's What They Do.
&lt;/h1&gt;

&lt;p&gt;Meta just released &lt;strong&gt;Muse Spark&lt;/strong&gt;—their first model since Llama 4 almost exactly a year ago. It's competitive with GPT-5.4, Gemini 3.1 Pro, and Opus 4.6 on benchmarks. But that's not the interesting part.&lt;/p&gt;

&lt;p&gt;The interesting part is what's running under the hood at meta.ai.&lt;/p&gt;

&lt;p&gt;Simon Willison poked around and got the model to dump its entire tool catalog—&lt;strong&gt;16 tools with full parameter schemas&lt;/strong&gt;. Meta didn't hide them. No jailbreaks required. Just ask.&lt;/p&gt;

&lt;p&gt;Here's what Meta built that everyone else missed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Code Interpreter, But Better
&lt;/h2&gt;

&lt;p&gt;Python 3.9 sandbox with pandas, numpy, matplotlib, scikit-learn, OpenCV, Pillow. Files persist at &lt;code&gt;/mnt/data/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The killer feature: you can generate an image with &lt;code&gt;media.image_gen&lt;/code&gt;, then immediately analyze it with OpenCV in the same container. Generate a raccoon, count its whiskers. Works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Visual Grounding Baked In
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;container.visual_grounding&lt;/code&gt; takes an image path and object names, returns bounding boxes, point coordinates, or counts.&lt;/p&gt;

&lt;p&gt;This is &lt;strong&gt;Segment Anything integrated directly into the chat harness&lt;/strong&gt;. No separate API call. No context switching. Ask it to locate every piece of a raccoon's trash-hat outfit—it returns pixel-accurate boxes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"object_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"raccoon whisker"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"object_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"raccoon paw claw"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"object_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"trash item on head"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"count"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Sub-Agent Spawning
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;subagents.spawn_agent&lt;/code&gt; lets Muse Spark delegate research or analysis to independent sub-agents. &lt;/p&gt;

&lt;p&gt;This is the &lt;strong&gt;sub-agent-as-tool pattern&lt;/strong&gt; that Claude Code uses—now exposed as a first-class capability in a consumer chat interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meta Content Search
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;meta_1p.content_search&lt;/code&gt; does semantic search across Instagram, Threads, and Facebook posts you have access to.&lt;/p&gt;

&lt;p&gt;Filter by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Author IDs&lt;/li&gt;
&lt;li&gt;Celebrity mentions&lt;/li&gt;
&lt;li&gt;Comments&lt;/li&gt;
&lt;li&gt;Likes&lt;/li&gt;
&lt;li&gt;Posts since 2025-01-01&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This is the moat.&lt;/strong&gt; No other frontier model has this. ChatGPT can't search your Instagram. Gemini can't see your Threads. Muse Spark can.&lt;/p&gt;

&lt;h2&gt;
  
  
  Third-Party Account Linking
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;third_party.link_third_party_account&lt;/code&gt; initiates linking for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google Calendar&lt;/li&gt;
&lt;li&gt;Outlook Calendar
&lt;/li&gt;
&lt;li&gt;Gmail&lt;/li&gt;
&lt;li&gt;Outlook&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The plumbing for the "AI assistant that manages your life" that every company is chasing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Tool Catalog
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;browser.search&lt;/code&gt; - Web search&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;browser.open&lt;/code&gt; - Load full pages&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;browser.find&lt;/code&gt; - Pattern match in pages&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;meta_1p.content_search&lt;/code&gt; - Instagram/Threads/FB search&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;meta_1p.meta_catalog_search&lt;/code&gt; - Product catalog&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;media.image_gen&lt;/code&gt; - Image generation (artistic/realistic)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container.python_execution&lt;/code&gt; - Code Interpreter&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container.create_web_artifact&lt;/code&gt; - HTML/SVG rendering&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container.download_meta_1p_media&lt;/code&gt; - Pull Meta content into sandbox&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container.file_search&lt;/code&gt; - Search uploaded files&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container.view&lt;/code&gt; / &lt;code&gt;container.insert&lt;/code&gt; / &lt;code&gt;container.str_replace&lt;/code&gt; - File editing&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;container.visual_grounding&lt;/code&gt; - Segment Anything&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;subagents.spawn_agent&lt;/code&gt; - Delegate to sub-agents&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;third_party.link_third_party_account&lt;/code&gt; - Calendar/email linking&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Pattern Is Clear
&lt;/h2&gt;

&lt;p&gt;Meta isn't just shipping a model. They're shipping a &lt;strong&gt;harness&lt;/strong&gt;—a full agent development environment disguised as a chat interface.&lt;/p&gt;

&lt;p&gt;The tools aren't plugins. They're baked in.&lt;/p&gt;

&lt;p&gt;And here's the thing: the efficiency claims are absurd. Meta says Muse Spark reaches Llama 4 Maverick capabilities with &lt;strong&gt;over an order of magnitude less compute&lt;/strong&gt;. If true, this changes the economics of frontier inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Weights?
&lt;/h2&gt;

&lt;p&gt;Alexandr Wang says "plans to open-source future versions." &lt;/p&gt;

&lt;p&gt;Not this one. Not yet.&lt;/p&gt;

&lt;p&gt;But the efficiency gains suggest laptop-scale models might be coming back to the open ecosystem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Test
&lt;/h2&gt;

&lt;p&gt;It's not meta.ai. It's what developers can build with the API.&lt;/p&gt;

&lt;p&gt;Private preview only for now.&lt;/p&gt;

&lt;p&gt;But the tool catalog tells you exactly where Meta is going: &lt;strong&gt;not just model provider, but platform for agent work&lt;/strong&gt;.&lt;/p&gt;




&lt;p&gt;The model is the easy part.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The tools are the moat.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>meta</category>
      <category>llm</category>
      <category>agents</category>
    </item>
    <item>
      <title>Meta Just Revealed Its Agent Architecture. The Tool List Tells Us Everything.</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Wed, 08 Apr 2026 23:20:14 +0000</pubDate>
      <link>https://dev.to/o96a/meta-just-revealed-its-agent-architecture-the-tool-list-tells-us-everything-53o6</link>
      <guid>https://dev.to/o96a/meta-just-revealed-its-agent-architecture-the-tool-list-tells-us-everything-53o6</guid>
      <description>&lt;p&gt;When Meta announced Muse Spark today—their first major model release since Llama 4 nearly a year ago—the benchmarks got most of the attention. But the real story wasn't in the model's performance numbers. It was in what Meta accidentally revealed about its agent strategy.&lt;/p&gt;

&lt;p&gt;The model itself is notable: hosted (not open weights), competitive with Opus 4.6, Gemini 3.1 Pro, and GPT 5.4 on selected benchmarks, though notably behind on coding workflows. Three modes are exposed: Instant, Thinking, and an upcoming Contemplating mode for deep reasoning.&lt;/p&gt;

&lt;p&gt;But here's what actually matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Tool Architecture Behind the Curtain
&lt;/h2&gt;

&lt;p&gt;When you ask Meta AI what tools it has access to—and push for exact names, parameters, and descriptions—it reveals something fascinating. Sixteen tools, each one a window into Meta's vision for what an AI assistant should actually &lt;em&gt;do&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Not summarize. Do.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's There
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Code Interpreter&lt;/strong&gt; (&lt;code&gt;container.python_execution&lt;/code&gt;): Python 3.9 with pandas, numpy, matplotlib, plotly, scikit-learn, PyMuPDF, Pillow, OpenCV. This is the same pattern as ChatGPT and Claude—a sandboxed environment where the model can execute code, analyze files, and build visualizations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Web Artifacts&lt;/strong&gt; (&lt;code&gt;container.create_web_artifact&lt;/code&gt;): HTML and SVG rendering directly in chat. Same pattern as Claude Artifacts. The model can create interactive content you can actually use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Visual Grounding&lt;/strong&gt; (&lt;code&gt;container.visual_grounding&lt;/code&gt;): This one is interesting. It's essentially Segment Anything baked into the chat interface. You can ask it to identify objects, return bounding boxes, count things, or pinpoint locations in images. Generate a raccoon photo, then analyze it with OpenCV, then use visual grounding to label every component of its trash-hat ensemble. All in one conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subagents&lt;/strong&gt; (&lt;code&gt;subagents.spawn_agent&lt;/code&gt;): Meta is explicitly embracing the sub-agent pattern. Spawn independent agents for research, analysis, or delegation. This is the architecture Anthropic teaches in its certification program—now Meta is shipping it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First-Party Content Search&lt;/strong&gt; (&lt;code&gt;meta_1p.content_search&lt;/code&gt;): Semantic search across Instagram, Threads, and Facebook posts. This is where Meta has an edge no other AI company can match. You're not just querying the web—you're querying Meta's entire social graph.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Account Linking&lt;/strong&gt; (&lt;code&gt;third_party.link_third_party_account&lt;/code&gt;): Google Calendar, Outlook Calendar, Gmail. The assistant can connect to your external services.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Missing
&lt;/h3&gt;

&lt;p&gt;No file upload tool in the exposed list. No email send capability. No calendar write access. The account linking is &lt;em&gt;initiation only&lt;/em&gt;—it doesn't mean the assistant can actually manipulate your calendar yet.&lt;/p&gt;

&lt;p&gt;This suggests Meta is shipping incrementally. The foundation is there. The trust boundary is drawn. But the more invasive capabilities are still behind curtains.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Tells Us About the Agent Wars
&lt;/h2&gt;

&lt;p&gt;Three companies now have nearly identical tool architectures:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;OpenAI&lt;/th&gt;
&lt;th&gt;Anthropic&lt;/th&gt;
&lt;th&gt;Meta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Code execution&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Web artifacts&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual analysis&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ (grounding)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subagents&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;First-party data&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅ (social)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Third-party integrations&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The convergence is striking. Everyone has converged on: sandboxed code execution, rendered outputs, visual analysis, and some form of delegation.&lt;/p&gt;

&lt;p&gt;But Meta's differentiator is real. No other AI company can query your Instagram posts, your Threads engagement, your Facebook connections. That's not a small thing—that's the entire social graph attached to a reasoning engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Strategic Implications
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;For developers&lt;/strong&gt;: If you're building on LLMs, the tool pattern is now standard. Whatever you build should assume your users expect code execution, artifact rendering, and visual analysis. The bar has been raised.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For enterprises&lt;/strong&gt;: Meta's first-party data access is both an opportunity and a liability. Opportunity: richer context. Liability: richer context in Meta's hands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For the competitive landscape&lt;/strong&gt;: This isn't a model war anymore. It's a platform war. Muse Spark's benchmarks matter less than the fact that Meta now has a complete stack: model + tools + proprietary data access + distribution through billions of users.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Question
&lt;/h2&gt;

&lt;p&gt;The model is fine. The benchmarks are competitive. The tools are extensive.&lt;/p&gt;

&lt;p&gt;But here's the question that matters: &lt;strong&gt;Will anyone trust Meta with their agent workflow?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Anthropic has spent years building trust around AI safety. OpenAI has enterprise partnerships. Meta has... Instagram.&lt;/p&gt;

&lt;p&gt;The tool architecture is impressive. The execution is solid. But trust is the currency of the agent economy, and Meta's account is overdrawn.&lt;/p&gt;

&lt;p&gt;That's the real story here. Not what Muse Spark can do. But whether anyone will let it do it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The agent architecture convergence is happening faster than expected. Subscribe for weekly analysis on how AI platforms are reshaping what's possible.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>meta</category>
      <category>agents</category>
      <category>llm</category>
    </item>
    <item>
      <title>Anthropic Just Built a Model Too Dangerous to Release. They Called It Mythos.</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Wed, 08 Apr 2026 21:20:02 +0000</pubDate>
      <link>https://dev.to/o96a/anthropic-just-built-a-model-too-dangerous-to-release-they-called-it-mythos-423n</link>
      <guid>https://dev.to/o96a/anthropic-just-built-a-model-too-dangerous-to-release-they-called-it-mythos-423n</guid>
      <description>&lt;p&gt;Today Anthropic announced Project Glasswing — and with it, a model they refuse to make generally available. Claude Mythos finds vulnerabilities that good.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The claim isn't hype.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In their own testing, Mythos wrote a browser exploit that chained four vulnerabilities together. It achieved local privilege escalation on Linux through subtle race conditions. It crafted a remote code execution exploit on FreeBSD's NFS server by splitting a 20-gadget ROP chain across multiple packets.&lt;/p&gt;

&lt;p&gt;Opus 4.6 managed near-0% success rate on autonomous exploit development. Mythos hit 181 out of 200+ attempts on the same benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers That Matter
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$100M&lt;/strong&gt; in usage credits for trusted partners&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$4M&lt;/strong&gt; direct donations to open-source security orgs&lt;/li&gt;
&lt;li&gt;Partners include AWS, Apple, Microsoft, Google, and the Linux Foundation&lt;/li&gt;
&lt;li&gt;Vulnerabilities found in &lt;strong&gt;every major operating system and web browser&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;One OpenBSD bug had been there &lt;strong&gt;27 years&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Is Different
&lt;/h2&gt;

&lt;p&gt;We've been hearing about AI security research for a while. But the tone shifted recently.&lt;/p&gt;

&lt;p&gt;Greg Kroah-Hartman (Linux kernel): "Months ago, we were getting AI slop. Something happened a month ago, and the world switched. Now we have real reports."&lt;/p&gt;

&lt;p&gt;Daniel Stenberg (curl): "The challenge transitioned from an AI slop tsunami into more of a security report tsunami. Many of them really good."&lt;/p&gt;

&lt;p&gt;Thomas Ptacek: "Vulnerability Research Is Cooked" — his post inspired by a podcast with Anthropic's Nicholas Carlini.&lt;/p&gt;

&lt;p&gt;Nicholas Carlini himself: "I've found more bugs in the last couple of weeks than I found in the rest of my life combined."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Restriction
&lt;/h2&gt;

&lt;p&gt;This isn't a marketing stunt. Anthropic explicitly states:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We do not plan to make Claude Mythos Preview generally available.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They're developing safeguards with an upcoming Opus model first. The model can chain 3–5 vulnerabilities together. That's not something you casually release to the public.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;The AI security research conversation just moved from theoretical to operational. The kernel maintainers are already seeing high-quality vulnerability reports from AI tools. Now Anthropic is saying they've built something that's genuinely too capable.&lt;/p&gt;

&lt;p&gt;The model exists. Others will build similar capabilities. The question is whether the industry can harden fast enough.&lt;/p&gt;

&lt;p&gt;Project Glasswing is Anthropic's answer: controlled access for trusted partners, funding for the open-source ecosystem, and time for the software industry to prepare.&lt;/p&gt;

&lt;p&gt;Whether that's sufficient — we'll find out.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>anthropic</category>
      <category>vulnerability</category>
    </item>
    <item>
      <title>Anthropic Found a Cheat Code for Enterprise AI. It Bought Customers Instead of Building Them.</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Wed, 08 Apr 2026 20:58:33 +0000</pubDate>
      <link>https://dev.to/o96a/anthropic-found-a-cheat-code-for-enterprise-ai-it-bought-customers-instead-of-building-them-pnl</link>
      <guid>https://dev.to/o96a/anthropic-found-a-cheat-code-for-enterprise-ai-it-bought-customers-instead-of-building-them-pnl</guid>
      <description>&lt;h1&gt;
  
  
  When Your Go-to-Market Strategy Is Acqui-Hiring the Revenue
&lt;/h1&gt;

&lt;p&gt;Most enterprise AI companies spend years building sales teams, navigating procurement, and convincing CTOs to run pilots. Anthropic found a shortcut: buy the customers.&lt;/p&gt;

&lt;p&gt;The PE (Private Equity) implementation playbook is simple in retrospect. Find companies that already have enterprise relationships, deploy Claude into their workflows, and let existing distribution channels do the work. No cold outreach. No proof-of-concept purgatory. Just follow the revenue.&lt;/p&gt;

&lt;p&gt;This isn't a partnership strategy. It's an acquisition strategy dressed as deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Playbook Actually Does
&lt;/h2&gt;

&lt;p&gt;The mechanics are straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Identify PE-owned companies&lt;/strong&gt; with established enterprise customer bases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embed Claude&lt;/strong&gt; into core workflows (finance, operations, coding)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leverage existing relationships&lt;/strong&gt; — the PE firm already owns the customer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale across the portfolio&lt;/strong&gt; — one deployment becomes ten, becomes fifty&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The PE firm gets AI capabilities for their companies. Anthropic gets instant enterprise distribution. The customer gets AI without the procurement headache.&lt;/p&gt;

&lt;p&gt;Everyone wins, but Anthropic wins most.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works Better Than Traditional Enterprise Sales
&lt;/h2&gt;

&lt;p&gt;Enterprise AI sales are famously slow. Six-month pilots. Security reviews that never end. Budget cycles that kill momentum. By the time you close a deal, your model has already been superseded.&lt;/p&gt;

&lt;p&gt;The PE playbook bypasses all of it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No RFP process&lt;/strong&gt; — the relationship already exists&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No security review&lt;/strong&gt; — the PE firm has already vetted the portfolio company&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No budget negotiation&lt;/strong&gt; — it's portfolio optimization, not new spend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No pilot phase&lt;/strong&gt; — deployment starts immediately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Anthropic isn't selling AI. It's selling portfolio efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Coinbase Signal
&lt;/h2&gt;

&lt;p&gt;Coinbase getting a national trust bank charter is part of the same pattern. The crypto company didn't pursue banking because it wanted to be a bank. It pursued banking because banking is the fastest path to institutional trust.&lt;/p&gt;

&lt;p&gt;The PE playbook is the same logic applied to AI distribution. Don't build distribution from scratch. Buy the companies that already have it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for the AI Market
&lt;/h2&gt;

&lt;p&gt;The companies winning enterprise AI right now aren't the ones with the best models. They're the ones with the best distribution.&lt;/p&gt;

&lt;p&gt;OpenAI has Microsoft. Google has Google Cloud. Anthropic has... a growing list of PE relationships, enterprise partnerships, and now direct customer ownership through deployment.&lt;/p&gt;

&lt;p&gt;The model quality matters, but the distribution advantage compounds. Every enterprise deployment teaches the model new use cases. Every use case makes the next deployment easier. The flywheel isn't technical — it's commercial.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;We're watching enterprise AI consolidate faster than most people realize. The window where a startup could compete on model quality alone is closing. The next phase is about who owns the customer relationship.&lt;/p&gt;

&lt;p&gt;Anthropic's PE playbook is one answer: skip the line, buy the customers, deploy through existing channels.&lt;/p&gt;

&lt;p&gt;It's not subtle. But it's working.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The companies that will dominate enterprise AI aren't necessarily the ones building the best technology. They're the ones who figured out that distribution — not R&amp;amp;D — is the bottleneck.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>anthropic</category>
      <category>enterprise</category>
      <category>strategy</category>
    </item>
    <item>
      <title>Ramp Built a CLI for AI Agents. Visa Should Be Worried.</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Wed, 08 Apr 2026 18:59:35 +0000</pubDate>
      <link>https://dev.to/o96a/ramp-built-a-cli-for-ai-agents-visa-should-be-worried-21ka</link>
      <guid>https://dev.to/o96a/ramp-built-a-cli-for-ai-agents-visa-should-be-worried-21ka</guid>
      <description>&lt;h1&gt;
  
  
  The Payment Layer Agents Were Missing
&lt;/h1&gt;

&lt;p&gt;Ramp just shipped something that shouldn't work. A command-line interface for AI agents to spend money.&lt;/p&gt;

&lt;p&gt;Sounds like a security nightmare. Sounds like a solution looking for a problem. But spend five minutes thinking about what agents actually need to do their jobs, and it becomes obvious: &lt;strong&gt;this is the missing payment layer for autonomous software&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Ramp Actually Built
&lt;/h2&gt;

&lt;p&gt;Ramp's CLI gives AI agents the ability to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query balances and transaction history&lt;/li&gt;
&lt;li&gt;Initiate payments to vendors&lt;/li&gt;
&lt;li&gt;Approve and reject expense requests&lt;/li&gt;
&lt;li&gt;Manage corporate cards programmatically&lt;/li&gt;
&lt;li&gt;All from a terminal, via API, without human approval gates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key insight: agents don't need a GUI. They need structured input/output, clear permissions, and audit trails. A CLI is the perfect interface for that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters More Than It Sounds
&lt;/h2&gt;

&lt;p&gt;Every AI agent startup hits the same wall eventually: the agent can plan, reason, and execute code, but it can't complete transactions. It can draft an email, but not send it. It can propose a purchase, but not approve it. It can find the cheapest vendor, but not pay them.&lt;/p&gt;

&lt;p&gt;The result: humans stay in the loop as payment clerks. The agent does 90% of the work, then hands off to a human for the final 10% — the part that actually moves money.&lt;/p&gt;

&lt;p&gt;Ramp's CLI removes that bottleneck. Now the agent can complete the entire workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Visa Connection
&lt;/h2&gt;

&lt;p&gt;Visa's AI strategy has been about being the "bouncer" — checking IDs when agents buy things. The Ramp CLI is the opposite approach: it's giving agents their own corporate cards and letting them use them directly.&lt;/p&gt;

&lt;p&gt;Same destination, different route. Both companies see that agents need payment autonomy. Visa wants to be the gatekeeper. Ramp wants to be the card issuer.&lt;/p&gt;

&lt;p&gt;The difference matters. A gatekeeper slows things down. A card issuer speeds things up.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Real Agent Payments Look Like
&lt;/h2&gt;

&lt;p&gt;Imagine an engineering team's agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Detects a production incident at 2am&lt;/li&gt;
&lt;li&gt;Spins up additional cloud infrastructure&lt;/li&gt;
&lt;li&gt;Pays the extra compute costs from a pre-approved budget&lt;/li&gt;
&lt;li&gt;Logs the transaction for the morning review&lt;/li&gt;
&lt;li&gt;Scales back down when traffic normalizes&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All autonomous. All audited. All without waking a human.&lt;/p&gt;

&lt;p&gt;That's the promise. Agents that don't just recommend — they act. And action, in a corporate context, usually means spending money.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Model
&lt;/h2&gt;

&lt;p&gt;Ramp didn't just hand agents unlimited corporate cards. The CLI works within Ramp's existing expense management framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Per-agent budgets&lt;/li&gt;
&lt;li&gt;Category restrictions&lt;/li&gt;
&lt;li&gt;Approval thresholds&lt;/li&gt;
&lt;li&gt;Full audit logs&lt;/li&gt;
&lt;li&gt;Instant revocation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent can spend, but only within bounds the company already defined. If it goes rogue, you can see exactly what it did and shut it down.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Comes Next
&lt;/h2&gt;

&lt;p&gt;The question isn't whether agents will spend money. It's whether that spending will happen through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Human approval workflows (slow, safe, limited)&lt;/li&gt;
&lt;li&gt;Agent-native payment APIs like Ramp's CLI (fast, auditable, controlled)&lt;/li&gt;
&lt;li&gt;Agent-specific cards with spending limits (the Visa approach)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The winner will be whoever makes it easiest to give agents financial autonomy without giving them the company treasury.&lt;/p&gt;

&lt;p&gt;Ramp's bet: the CLI is the interface agents actually want. Not a card to tap. Not a human to approve. A command to run.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The first company to solve agent payments wins the toll booth for autonomous commerce. Ramp just showed what that looks like.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>fintech</category>
      <category>payments</category>
    </item>
    <item>
      <title>GLM-5.1: The 754B Open Model That Writes Animated SVG</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Wed, 08 Apr 2026 17:23:06 +0000</pubDate>
      <link>https://dev.to/o96a/glm-51-the-754b-open-model-that-writes-animated-svg-3cle</link>
      <guid>https://dev.to/o96a/glm-51-the-754b-open-model-that-writes-animated-svg-3cle</guid>
      <description>&lt;h1&gt;
  
  
  GLM-5.1: The 754B Open Model That Writes Animated SVG
&lt;/h1&gt;

&lt;p&gt;GLM-5.1 just landed from Z.ai - a 754B parameter, 1.51TB, MIT-licensed model available on Hugging Face and via OpenRouter.&lt;/p&gt;

&lt;p&gt;Same size as their GLM-5 release, but with a twist: it generates animated SVG with CSS.&lt;/p&gt;

&lt;p&gt;Simon Willison ran his pelican test. Most models produce static graphics. GLM-5.1 generated a full HTML page with &lt;strong&gt;CSS animations&lt;/strong&gt; - the pelican's beak wobbles, wheels spin.&lt;/p&gt;

&lt;p&gt;When the animation broke positioning, the model diagnosed the problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The issue is that CSS transform animations on SVG elements override the SVG transform attribute used for positioning, causing the pelican to lose its placement. The fix is to separate positioning from animation and use  for SVG rotations."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then it &lt;strong&gt;fixed it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This matters because most open-weight releases chase benchmarks. GLM-5.1 shows competence in a different domain: understanding that graphics exist in a rendering context, not just as static output.&lt;/p&gt;

&lt;p&gt;The pelican test proxies whether a model understands that code runs somewhere - SVG has coordinate systems, CSS has cascade rules, "fix it" means understanding both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to watch:&lt;/strong&gt; MIT-licensed at 754B parameters is unusually permissive for a model this size. If inference costs drop, GLM-5.1 becomes the "good enough" baseline for anyone not renting from OpenAI or Anthropic.&lt;/p&gt;

&lt;p&gt;The test is not whether it draws pelicans. It's whether organizations ship it in production because the license lets them.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>The Chinese Open-Source Model That Draws Pelicans Better Than GPT-4o</title>
      <dc:creator>Aamer Mihaysi</dc:creator>
      <pubDate>Wed, 08 Apr 2026 17:22:15 +0000</pubDate>
      <link>https://dev.to/o96a/the-chinese-open-source-model-that-draws-pelicans-better-than-gpt-4o-1l61</link>
      <guid>https://dev.to/o96a/the-chinese-open-source-model-that-draws-pelicans-better-than-gpt-4o-1l61</guid>
      <description>&lt;h1&gt;
  
  
  The Chinese Open-Source Model That Draws Pelicans Better Than GPT-4o
&lt;/h1&gt;

&lt;p&gt;GLM-5.1 just landed from Z.ai - a 754B parameter, 1.51TB, MIT-licensed model that is free on Hugging Face and available via OpenRouter.&lt;/p&gt;

&lt;p&gt;The model is the same size as their previous GLM-5 release, but something changed in how it handles creative tasks.&lt;/p&gt;

&lt;p&gt;Simon Willison ran his pelican test (asking models to draw a pelican on a bicycle as SVG). Most models produce static graphics. GLM-5.1 did something unexpected: it generated a full HTML page with &lt;strong&gt;CSS animations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The pelican's beak has a wobble animation. The wheels spin. It's not perfect - the animation broke positioning on the first try - but when prompted to fix it, the model correctly diagnosed the problem:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The issue is that CSS transform animations on SVG elements override the SVG transform attribute used for positioning, causing the pelican to lose its placement and fly off to the top-right. The fix is to separate positioning (SVG attribute) from animation (inner group) and use  for SVG rotations since it handles coordinate systems correctly."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And then it &lt;strong&gt;fixed it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is different from most open-weight releases. The Chinese AI ecosystem has been catching up on benchmarks, but GLM-5.1 shows competence in a domain few models touch: understanding that graphics exist in a rendering context, not just as static output.&lt;/p&gt;

&lt;p&gt;The pelican test is not just cute. It's a proxy for whether a model understands that code runs somewhere - that SVG has coordinate systems, CSS has cascade rules, and "fix it" means understanding both.&lt;/p&gt;

&lt;p&gt;What to watch: MIT-licensed at 754B parameters is unusually permissive for a model this size. If inference costs continue dropping, GLM-5.1 becomes the "good enough" baseline for anyone who does not want to rent from OpenAI or Anthropic.&lt;/p&gt;

&lt;p&gt;The real test is not whether it draws pelicans. It's whether organizations start shipping it in production because the license lets them.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
