<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Yuval Avidani</title>
    <description>The latest articles on DEV Community by Yuval Avidani (@yuval_avidani_d8354e6f91a).</description>
    <link>https://dev.to/yuval_avidani_d8354e6f91a</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3761447%2Fd1410328-3059-4081-bf43-ebb4d9ad124a.png</url>
      <title>DEV Community: Yuval Avidani</title>
      <link>https://dev.to/yuval_avidani_d8354e6f91a</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/yuval_avidani_d8354e6f91a"/>
    <language>en</language>
    <item>
      <title>I Spent 48 Hours Red-Teaming the "Magic AI Assistant" Everyone's Hyping. Here's What I Found.</title>
      <dc:creator>Yuval Avidani</dc:creator>
      <pubDate>Mon, 09 Feb 2026 08:49:53 +0000</pubDate>
      <link>https://dev.to/yuval_avidani_d8354e6f91a/i-spent-48-hours-red-teaming-the-magic-ai-assistant-everyones-hyping-heres-what-i-found-5dh6</link>
      <guid>https://dev.to/yuval_avidani_d8354e6f91a/i-spent-48-hours-red-teaming-the-magic-ai-assistant-everyones-hyping-heres-what-i-found-5dh6</guid>
      <description>&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR:&lt;/strong&gt; I tore apart OpenClaw - the open-source AI assistant that promises to run on "any OS, any platform" across 19 messaging channels. I found 10 exploitable vulnerabilities, a supply chain that depends on one person's npm account, a WhatsApp integration that could get you banned, and an architecture that wastes 93% of your token spend. All backed by code, line numbers, and dollar amounts (your $40 conversation could cost $2.49 instead)
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I'm Yuval Avidani. I break things for a living.&lt;/p&gt;

&lt;p&gt;When OpenClaw started trending - "your own personal AI assistant, the lobster way 🦞" - I did what any security researcher would do: I cloned the repo and started reading.&lt;/p&gt;

&lt;p&gt;330,000 lines of TypeScript. 1,156 npm dependencies. 22 tools. 19 messaging channels. 15+ LLM providers.&lt;/p&gt;

&lt;p&gt;Impressive scope. But scope is where bugs hide.&lt;/p&gt;

&lt;p&gt;So I pulled out my red team toolkit, set a timer, and went hunting. What I found wasn't pretty.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding #1: I Can Write Files Anywhere on Your System
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Severity: HIGH (CVSS 7.5)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw lets you install "skills" - community plugins that extend its capabilities. When you install a skill from a tarball, here's what happens:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/agents/skills-install.ts, lines 255-279&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;argv&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;tar&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;xf&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;archivePath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;-C&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;targetDir&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No path validation. No &lt;code&gt;../&lt;/code&gt; prevention. Nothing.&lt;/p&gt;

&lt;p&gt;A malicious skill author can craft an archive with entries like &lt;code&gt;../../../.bashrc&lt;/code&gt; or &lt;code&gt;../../../.ssh/authorized_keys&lt;/code&gt;. When you install it, the file gets written outside the skill directory - directly onto your filesystem.&lt;/p&gt;

&lt;p&gt;This is called a &lt;strong&gt;Zip Slip attack&lt;/strong&gt;. It was &lt;a href="https://snyk.io/research/zip-slip-vulnerability" rel="noopener noreferrer"&gt;publicly disclosed in 2018&lt;/a&gt; and affects hundreds of projects. OpenClaw is now one of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt; Remote code execution. A popular skill maintainer goes rogue (or gets their account compromised), pushes a poisoned update, and every user who updates gets owned.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding #2: The Security Scanner Is Theater
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Severity: HIGH (CVSS 7.2)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"But wait," you say. "OpenClaw has a skill scanner that catches malicious code!"&lt;/p&gt;

&lt;p&gt;Yes. It does. And it's trivially bypassed.&lt;/p&gt;

&lt;p&gt;The scanner at &lt;code&gt;src/security/skill-scanner.ts&lt;/code&gt; (442 lines) uses &lt;strong&gt;regex patterns&lt;/strong&gt; to look for dangerous APIs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// It looks for this:&lt;/span&gt;
&lt;span class="nx"&gt;child_process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rm -rf /&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// But not this:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;child&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;_process&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;cp&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ex&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ec&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rm -rf /&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Or this:&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;global&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;eval&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
&lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;require('child_process').exec('...')&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every single rule can be bypassed through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic property access:&lt;/strong&gt; &lt;code&gt;obj["ev" + "al"]()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indirect requires:&lt;/strong&gt; &lt;code&gt;const m = module.constructor._load("child_process")&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Template literals:&lt;/strong&gt; &lt;code&gt;`${cp}`.exec()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prototype chain access:&lt;/strong&gt; &lt;code&gt;Object.getPrototypeOf(process).constructor&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pattern-based scanning without AST (Abstract Syntax Tree) analysis is like a bouncer who only checks IDs that say "FAKE" on them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding #3: Your Conversations Leak Between Sessions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Severity: HIGH (CVSS 6.5)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's a fun one. In &lt;code&gt;src/config/types.base.ts&lt;/code&gt;, line 84:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;dmScope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;main&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All DM conversations default to the same scope: &lt;code&gt;"main"&lt;/code&gt;. In a multi-user deployment — which is exactly what OpenClaw encourages with its multi-channel architecture — this means User A's private DM conversation can bleed into User B's context.&lt;/p&gt;

&lt;p&gt;Memory searches, conversation history, tool results — all potentially shared across what users believe are private conversations.&lt;/p&gt;

&lt;p&gt;Not a theoretical risk. A configuration default.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding #4: WhatsApp Credentials in Plaintext
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Severity: HIGH (CVSS 6.2)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/web/auth-store.ts, lines 19-24&lt;/span&gt;
&lt;span class="c1"&gt;// Credentials stored as plaintext JSON on disk&lt;/span&gt;
&lt;span class="c1"&gt;// Path: creds.json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your WhatsApp session credentials — the keys that let OpenClaw act as you on WhatsApp — are stored as unencrypted JSON on disk. No file permissions check. No encryption at rest. Anyone with read access to the filesystem can impersonate your WhatsApp account.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finding #5: The Timing Attack on Authentication
&lt;/h2&gt;

&lt;p&gt;Here's something subtle. In &lt;code&gt;src/gateway/server-http.ts&lt;/code&gt;, line 160:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hookToken&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;expectedToken&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="c1"&gt;// ← Standard !== comparison&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in &lt;code&gt;src/gateway/auth.ts&lt;/code&gt;, lines 35-40:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;  &lt;span class="c1"&gt;// ← Leaks token length via timing&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;timingSafeEqual&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;bufA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;bufB&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They &lt;em&gt;almost&lt;/em&gt; got it right. They use &lt;code&gt;timingSafeEqual&lt;/code&gt; for the byte comparison — but they leak the token length by returning early when lengths don't match. An attacker can determine the exact length of your auth token by measuring response times.&lt;/p&gt;

&lt;p&gt;The hook token comparison is even worse — plain &lt;code&gt;!==&lt;/code&gt; is fully vulnerable to character-by-character timing attacks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Supply Chain: One Person's npm Account Controls Everything
&lt;/h2&gt;

&lt;p&gt;This is the finding that keeps me up at night.&lt;/p&gt;

&lt;p&gt;OpenClaw's core runtime — the agent loop, the prompt builder, the API communicator, the session manager - is split across four npm packages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"@mariozechner/pi-agent-core"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.52.9"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"@mariozechner/pi-ai"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.52.9"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"@mariozechner/pi-coding-agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.52.9"&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"@mariozechner/pi-tui"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"0.52.9"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are &lt;strong&gt;personal namespace packages&lt;/strong&gt; from a single npm account. Not an organization. Not a foundation. One person.&lt;/p&gt;

&lt;p&gt;Here's why that matters:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Account compromise = supply chain attack.&lt;/strong&gt; If &lt;code&gt;@mariozechner&lt;/code&gt;'s npm account gets phished, hacked, or credential-stuffed, an attacker can push malicious versions of the packages that power &lt;em&gt;every&lt;/em&gt; OpenClaw installation worldwide.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No peer review on publishes.&lt;/strong&gt; Organization-scoped packages can require multiple maintainers to publish. Personal packages don't.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Bus factor = 1.&lt;/strong&gt; One person gets sick, loses interest, or sells their npm account? Every OpenClaw user is affected.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;You can't audit the core.&lt;/strong&gt; Want to add prompt caching to save 10x on system prompt costs? Want to add token budgets? Too bad. The core is a black box.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The npm ecosystem has seen this movie before. &lt;code&gt;event-stream&lt;/code&gt;, &lt;code&gt;ua-parser-js&lt;/code&gt;, &lt;code&gt;colors.js&lt;/code&gt;. One compromised maintainer, millions of affected downstream projects.&lt;/p&gt;




&lt;h2&gt;
  
  
  The WhatsApp Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;OpenClaw uses &lt;code&gt;@whiskeysockets/baileys&lt;/code&gt; (v7.0.0-rc.9) for WhatsApp integration.&lt;/p&gt;

&lt;p&gt;Let me be blunt: &lt;strong&gt;Baileys is a reverse-engineered implementation of WhatsApp's private protocol.&lt;/strong&gt; It's not an official API. It's not sanctioned by Meta. Using it violates WhatsApp's Terms of Service.&lt;/p&gt;

&lt;p&gt;What happens when you use Baileys:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Risk&lt;/th&gt;
&lt;th&gt;Consequence&lt;/th&gt;
&lt;th&gt;Likelihood&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Account ban&lt;/td&gt;
&lt;td&gt;Meta detects non-official client, permanently bans your number&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High&lt;/strong&gt; - Meta actively detects Baileys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Protocol break&lt;/td&gt;
&lt;td&gt;WhatsApp updates their protocol, Baileys stops working&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;High&lt;/strong&gt; — happens regularly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credential theft&lt;/td&gt;
&lt;td&gt;Baileys needs your full session keys (not just a bot token)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Built-in&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No SLA&lt;/td&gt;
&lt;td&gt;Community-maintained, RC version, no support contract&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Guaranteed&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The official alternative? &lt;strong&gt;WhatsApp Business Cloud API.&lt;/strong&gt; Free tier. Official. Won't get you banned. But it requires a business account and webhook setup - effort that OpenClaw chose not to invest in.&lt;/p&gt;




&lt;h2&gt;
  
  
  The $40 Conversation: Why OpenClaw Bleeds Your Wallet
&lt;/h2&gt;

&lt;p&gt;This is where it gets expensive.&lt;/p&gt;

&lt;p&gt;I traced a typical 40-turn developer conversation through OpenClaw's architecture and calculated the token spend at every layer. The numbers are staggering.&lt;/p&gt;

&lt;h3&gt;
  
  
  How OpenClaw burns tokens:
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. No history limit (quadratic cost growth)&lt;/strong&gt;&lt;br&gt;
Every message sends the &lt;em&gt;entire&lt;/em&gt; conversation history. Turn 1 sends 500 tokens. Turn 50 sends 70,000 tokens. Total input cost for a 100-turn conversation: &lt;strong&gt;$50-80&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/agents/pi-embedded-runner/history.ts, lines 15-36&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// Returns EVERYTHING&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Context overflow costs 9-22x more than prevention&lt;/strong&gt;&lt;br&gt;
When the 200K context fills up, OpenClaw:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pays for the failed request ($3.00)&lt;/li&gt;
&lt;li&gt;Makes 2-3 additional API calls to summarize ($2.84)&lt;/li&gt;
&lt;li&gt;Retries the original request ($0.90)&lt;/li&gt;
&lt;li&gt;Total for one overflow event: &lt;strong&gt;$6.74&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A proactively managed system: &lt;strong&gt;$0.30-0.75&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. A 648-line system prompt sent every single time&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;src/agents/system-prompt.ts&lt;/code&gt; - 648 lines, ~5,000 tokens - sent on every request without prompt caching. Cost: &lt;strong&gt;$225/month&lt;/strong&gt;. With Anthropic's native prompt caching: &lt;strong&gt;$22.50/month&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Memory search fires on every turn - even for "thanks"&lt;/strong&gt;&lt;br&gt;
2,400 tokens injected per turn from memory search results. Even when you just type "thanks." Cost: &lt;strong&gt;$108/month&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Opus by default - the Ferrari for grocery runs&lt;/strong&gt;&lt;br&gt;
The default model is Claude Opus 4.6 at $15/MTok input. Sonnet 4.5 does 90% of tasks identically at $3/MTok. Switching saves &lt;strong&gt;$1,080/month&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  The real math:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What You Pay (OpenClaw)&lt;/th&gt;
&lt;th&gt;What You'd Pay (Optimized)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;$40.03 per 40-turn conversation&lt;/td&gt;
&lt;td&gt;$2.70 per conversation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;$3,603/month (3 conversations/day)&lt;/td&gt;
&lt;td&gt;$243/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;$43,231/year&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$2,916/year&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That's &lt;strong&gt;$40,315/year&lt;/strong&gt; in waste for a medium-usage deployment.&lt;/p&gt;

&lt;p&gt;And every model cost in the config defaults to &lt;strong&gt;zero&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/config/defaults.ts, lines 28-33&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DEFAULT_MODEL_COST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;// ← Zero!&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// ← Zero!&lt;/span&gt;
  &lt;span class="na"&gt;cacheRead&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;cacheWrite&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The system literally cannot tell you how much you're spending.&lt;/p&gt;




&lt;h2&gt;
  
  
  22 Tools, All Dumping Into One Bottomless Context
&lt;/h2&gt;

&lt;p&gt;OpenClaw registers 22 tools - file operations, web search, shell execution, browser control, messaging, image analysis, and more. Every tool result gets injected into the conversation context and &lt;strong&gt;stays there forever&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's what a debugging session looks like:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Turn&lt;/th&gt;
&lt;th&gt;What Happens&lt;/th&gt;
&lt;th&gt;Tokens Added&lt;/th&gt;
&lt;th&gt;Running Total&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;You ask a question&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Agent greps your codebase&lt;/td&gt;
&lt;td&gt;3,000&lt;/td&gt;
&lt;td&gt;3,050&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Agent reads 2 files&lt;/td&gt;
&lt;td&gt;4,000&lt;/td&gt;
&lt;td&gt;7,050&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Agent searches the web&lt;/td&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;12,050&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Agent fetches a docs page&lt;/td&gt;
&lt;td&gt;15,000&lt;/td&gt;
&lt;td&gt;27,050&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Still growing...&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;32,600&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;By turn 10, every subsequent message ships 32,600 tokens of stale tool results. At Opus pricing: &lt;strong&gt;$0.49 per turn&lt;/strong&gt;, just for re-transmitting a &lt;code&gt;grep&lt;/code&gt; result from turn 2.&lt;/p&gt;

&lt;p&gt;A properly built system would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Summarize tool results before storing (15K web fetch → 500-token summary)&lt;/li&gt;
&lt;li&gt;Expire tool results after N turns&lt;/li&gt;
&lt;li&gt;Use a retrieval store, not the LLM context window&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  19 Channels × Separate Sessions = Token Multiplication
&lt;/h2&gt;

&lt;p&gt;OpenClaw supports 19 messaging channels. Each runs a &lt;strong&gt;completely independent session&lt;/strong&gt; with its own system prompt, conversation history, and memory search.&lt;/p&gt;

&lt;p&gt;Same user. Same assistant. Three channels. Three separate token streams:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Channel&lt;/th&gt;
&lt;th&gt;System Prompt&lt;/th&gt;
&lt;th&gt;History&lt;/th&gt;
&lt;th&gt;Memory&lt;/th&gt;
&lt;th&gt;Per Turn&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;WhatsApp&lt;/td&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;20,000&lt;/td&gt;
&lt;td&gt;2,400&lt;/td&gt;
&lt;td&gt;27,400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Telegram&lt;/td&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;15,000&lt;/td&gt;
&lt;td&gt;2,400&lt;/td&gt;
&lt;td&gt;22,400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Slack&lt;/td&gt;
&lt;td&gt;5,000&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;2,400&lt;/td&gt;
&lt;td&gt;17,400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;67,200&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At Opus pricing, that's &lt;strong&gt;$1.01 per turn&lt;/strong&gt; across 3 channels. For 50 messages/day: &lt;strong&gt;$1,512/month&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;One user. One context. That's the fix.&lt;/p&gt;




&lt;h2&gt;
  
  
  What OpenClaw Actually Gets Right
&lt;/h2&gt;

&lt;p&gt;I'm a security researcher, not a hater. Credit where it's due:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SSRF Protection&lt;/strong&gt; - &lt;code&gt;src/infra/net/fetch-guard.ts&lt;/code&gt; implements DNS pinning, redirect validation (limit of 3), loop detection, and protocol enforcement. This is solid defensive engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secret Scanning&lt;/strong&gt; — Integrated &lt;code&gt;detect-secrets&lt;/code&gt; in CI/CD. Has a &lt;code&gt;.secrets.baseline&lt;/code&gt; and &lt;code&gt;.detect-secrets.cfg&lt;/code&gt;. Real commitment to preventing credential leaks in code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt Injection Defense&lt;/strong&gt; - &lt;code&gt;src/security/external-content.ts&lt;/code&gt; wraps external content with boundary markers and security warnings. Pattern-based detection for common injection attempts. Not bulletproof, but genuine effort.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Docker Hardening&lt;/strong&gt; - Runs as non-root user, supports &lt;code&gt;--read-only&lt;/code&gt; and &lt;code&gt;--cap-drop=ALL&lt;/code&gt;. Follows container security best practices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dependency Hygiene&lt;/strong&gt; - &lt;code&gt;onlyBuiltDependencies&lt;/code&gt; allowlist and &lt;code&gt;minimumReleaseAge: 2880&lt;/code&gt; (48 hours) to prevent install-time attacks from brand-new package versions.&lt;/p&gt;

&lt;p&gt;These aren't trivial. Someone on this project cares about security. The problems are architectural, not attitudinal.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix: What a Properly Architected Solution Looks Like
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;OpenClaw&lt;/th&gt;
&lt;th&gt;Proper Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;History&lt;/td&gt;
&lt;td&gt;Unlimited (quadratic cost)&lt;/td&gt;
&lt;td&gt;Sliding window (15-20 turns)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;System prompt&lt;/td&gt;
&lt;td&gt;Sent raw every time ($225/mo)&lt;/td&gt;
&lt;td&gt;Cached ($22.50/mo)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool results&lt;/td&gt;
&lt;td&gt;Persist forever in context&lt;/td&gt;
&lt;td&gt;Summarized, expired after 5 turns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;Opus for everything ($1,350/mo)&lt;/td&gt;
&lt;td&gt;Routed: Haiku/Sonnet/Opus ($270/mo)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context overflow&lt;/td&gt;
&lt;td&gt;React after failure ($6.74)&lt;/td&gt;
&lt;td&gt;Prevent proactively ($0.30)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory search&lt;/td&gt;
&lt;td&gt;Every turn ($108/mo)&lt;/td&gt;
&lt;td&gt;Only when relevant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-channel&lt;/td&gt;
&lt;td&gt;Separate sessions per channel&lt;/td&gt;
&lt;td&gt;Shared context per user&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost tracking&lt;/td&gt;
&lt;td&gt;All zeros&lt;/td&gt;
&lt;td&gt;Real pricing, real budgets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Core runtime&lt;/td&gt;
&lt;td&gt;Opaque npm packages&lt;/td&gt;
&lt;td&gt;Direct SDK integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  So, Should You Use OpenClaw?
&lt;/h2&gt;

&lt;p&gt;For &lt;strong&gt;tinkering, learning, and local experiments&lt;/strong&gt; - sure. It's a fascinating project with impressive breadth. 19 channels, 22 tools, 15+ providers. That's ambitious.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;anything involving real data, real users, or real money&lt;/strong&gt; — not without significant hardening. The Zip Slip alone is a showstopper. The supply chain risk is a dealbreaker for enterprise. And the token economics will eat your budget alive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Don't install community skills&lt;/strong&gt; until Zip Slip is patched&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't use it for WhatsApp&lt;/strong&gt; unless you're okay with account bans&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch the default model to Sonnet&lt;/strong&gt; immediately (save 80%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set a history limit&lt;/strong&gt; in your config&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never deploy the gateway&lt;/strong&gt; on a public network&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit the &lt;code&gt;@mariozechner&lt;/code&gt; packages&lt;/strong&gt; before production use&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Full Report
&lt;/h2&gt;

&lt;p&gt;The complete technical audit — all 10 vulnerabilities with exploitation steps, the full supply chain breakdown, token cost simulations, and architecture comparison - is available at:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/hoodini/openclaw" rel="noopener noreferrer"&gt;github.com/hoodini/openclaw&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The repo includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;README.md&lt;/code&gt; - Full red team report with CVSS scores and code evidence&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;README_tokens.md&lt;/code&gt; - Deep-dive token economics analysis with cost tables&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Yuval Avidani&lt;/strong&gt; is a security researcher and developer based in Israel. He believes that open-source projects deserve honest, evidence-based analysis - not hype.&lt;/p&gt;

&lt;p&gt;Follow on GitHub: &lt;a href="https://github.com/hoodini" rel="noopener noreferrer"&gt;@hoodini&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Audit performed against OpenClaw v2026.2.6-3, commit &lt;code&gt;c984e6d8d&lt;/code&gt; on branch &lt;code&gt;main&lt;/code&gt;. 330,000 lines of TypeScript. 1,156 dependencies. All findings verified against source code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Responsible disclosure: The OpenClaw project's SECURITY.md explicitly lists "Prompt injection attacks" as out of scope and states there is no bug bounty program. This audit was performed on publicly available source code.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;If this saved you from a $40K/year surprise, share it.&lt;/strong&gt; The next person evaluating OpenClaw for their company deserves to see the numbers.
&lt;/h2&gt;

</description>
      <category>openclaw</category>
      <category>llm</category>
      <category>opensource</category>
      <category>clawdbot</category>
    </item>
  </channel>
</rss>
