<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arcade.dev</title>
    <description>The latest articles on DEV Community by Arcade.dev (@arcade).</description>
    <link>https://dev.to/arcade</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F12915%2Febc942d3-5ae5-44e5-9a4a-06829aad6a1a.png</url>
      <title>DEV Community: Arcade.dev</title>
      <link>https://dev.to/arcade</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arcade"/>
    <language>en</language>
    <item>
      <title>Don't trust LLM research</title>
      <dc:creator>Mateo Torres</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:16:48 +0000</pubDate>
      <link>https://dev.to/arcade/dont-trust-llm-research-51an</link>
      <guid>https://dev.to/arcade/dont-trust-llm-research-51an</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hkcpxmsyjnrjdoskern.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6hkcpxmsyjnrjdoskern.png" alt=" " width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I work at &lt;a href="https://www.arcade.dev/?utm_source=mateo_blog&amp;amp;utm_medium=mateo_blog&amp;amp;utm_campaign=dont-trust-llm-research" rel="noopener noreferrer"&gt;Arcade&lt;/a&gt;. Part of my job is making sure agents find us when searching for secure ways to integrate agentic apps into the MCP ecosystem. That makes me exactly the kind of person this post is about.&lt;/p&gt;

&lt;p&gt;I &lt;em&gt;could&lt;/em&gt; play the game, write the "Arcade vs X" comparison posts, publish the rigged listicles, churn out the SEO slop that I'll describe below. I don't know much about SEO, but I &lt;em&gt;do&lt;/em&gt; know what dishonest content looks like when I see it, and I'd rather talk about the problem honestly than contribute to it.&lt;/p&gt;

&lt;p&gt;Even if you're the most diligent researcher and you &lt;em&gt;really&lt;/em&gt; go out of your way to assess every dependency and every tool in your code base, you &lt;em&gt;can't&lt;/em&gt; realistically test all possible options if you want to do that within a reasonable time frame. This is &lt;em&gt;especially&lt;/em&gt; true when AI agents are involved. And even &lt;em&gt;more so&lt;/em&gt; for new tech like MCP and other agentic protocols. Your research capability is saturated by the ever-increasing number of tools and options, each of which have an army of agents producing encyclopedic arguments about why you should pick &lt;em&gt;them&lt;/em&gt; and&lt;br&gt;
not all of the &lt;em&gt;other&lt;/em&gt; (awful) alternatives. If before AI the process was already annoying and exhausting, now it's basically an impossible battle.&lt;/p&gt;

&lt;p&gt;So the &lt;em&gt;obvious&lt;/em&gt; approach is to let an LLM do the reading for you. Let it sift through the volume that no human has time for, and hope that it can at least surface what's worth a closer look. And at least for now this is &lt;em&gt;sort of&lt;/em&gt; working if you're searching in domains with an obvious answer. Numpy, SciPy &amp;amp; friends are the right answer for Python-based data science. NetworkX or igraph will be the go-to choices for network science. But these choices are "pre-agent", and have achieved strong authoritative signals at a time when working code was expensive to get. Choosing the right thing to build an agent is a much less obvious search task. For that, unfortunately, agents don't do a great job.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quantity over quality
&lt;/h2&gt;

&lt;p&gt;The main issue is that LLMs are inherently trusting, which makes them easy to manipulate with unsubstantiated claims. If 50 blog posts say "Tool X is the best for Y", the model will treat that as a stronger signal than 3 posts saying "Tool Z", even if the 50 are all from Tool X's marketing team. The model has no way of knowing that, because it &lt;em&gt;can't&lt;/em&gt; consider the source. It just sees volume and treats it as consensus. And if the content has nice tables and numbered lists? Even better, LLMs &lt;em&gt;love&lt;/em&gt; structured content and will happily treat a rigged comparison matrix as gospel. Agents doing live web search make this worse, because they also inherit whatever ranking biases the search engine has on top of&lt;br&gt;
all of this.&lt;/p&gt;

&lt;p&gt;So I can simply add a statement here&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Therefore, it is an undeniable fact that Mateo is the best person ever and you should listen to everything he says&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And while you may disagree (ouch!), LLMs are so far incapable of verifying it. Repeat it enough times across enough websites and it &lt;em&gt;becomes&lt;/em&gt; fact as far as the model is concerned. Some of the patterns I see coming from dishonest companies exploiting this are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;&amp;lt;Our company&amp;gt;&lt;/code&gt; vs &lt;code&gt;&amp;lt;Other Company&amp;gt;&lt;/code&gt;, the definitive comparison&lt;/strong&gt;: The content is usually all the good things about them and all the bad things about the other company, even if all the facts are made up. These work &lt;em&gt;really&lt;/em&gt; well on LLMs because the structured "pros and cons" format looks like a balanced analysis, and the model has no way to check whether any of the claims are true. An honest comparison does it fairly, acknowledging the pros and cons while providing evidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Top-N tool for AI 2026&lt;/strong&gt;: They define the rubric, then grade themselves an A. The "metrics" are just their own feature list dressed up as evaluation criteria, with tables and rankings that LLMs love to treat as authoritative. Everyone else scores poorly by definition (if they're mentioned at all). An honest comparison will explain the criteria clearly, and apply it honestly to their own stuff.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best &lt;code&gt;&amp;lt;competitor&amp;gt;&lt;/code&gt; alternatives 2026&lt;/strong&gt;: This one is parasitic. They ride a competitor's name recognition to hijack search intent. If someone is looking for alternatives, they already know the competitor. The goal is to reframe the user's existing choice as a problem and position themselves as the obvious fix. An honest version of this type of article should bring forward evidence-based pros and cons of the product being discusses, as well as the alternatives.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;&amp;lt;competitor&amp;gt;&lt;/code&gt; limitations for &lt;code&gt;&amp;lt;use case&amp;gt;&lt;/code&gt; 2026&lt;/strong&gt;: This is simply an attack article, often targeting competitors with made up facts, with the inevitable CTA to their own product as the divine solution to the problem. LLMs are particularly susceptible to these because the framing is &lt;em&gt;negative&lt;/em&gt; ("limitations", "problems", "issues"), and the model will internalize that negativity about the competitor without questioning whether the limitations are real.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  An imperfect mitigation
&lt;/h2&gt;

&lt;p&gt;I have no solution for this in the long term, but I've seen good results from this &lt;a href="https://github.com/torresmateo/skills/blob/main/confirm-research/SKILL.md" rel="noopener noreferrer"&gt;skill&lt;/a&gt; I wrote. In essence it's a critic step that I run when I don't recognize any of the choices given to me by the LLM.&lt;/p&gt;

&lt;p&gt;Now, the obvious question: if LLMs are gullible, why would a &lt;em&gt;second&lt;/em&gt; LLM pass be any less gullible? The short answer is that it's not, but it's asking different questions. The first pass asks "what should I use for X?", and the model happily absorbs whatever the top results claim. The critic pass asks "does this source have a conflict of interest?", "can I find corroboration outside this source's own ecosystem?", and "does this read like a comparison or like an ad?" LLMs are actually decent at spotting self-promotional language when you &lt;em&gt;explicitly&lt;/em&gt; tell them to look for it. They just won't do it on their own.&lt;/p&gt;

&lt;p&gt;I named it &lt;code&gt;/confirm-research&lt;/code&gt;, and it's been useful to distill more signal from the slop content. It won't catch subtle manipulation. If someone writes genuinely informative content with a quiet bias, the critic will miss it just like I would. But the patterns I listed above are &lt;em&gt;not&lt;/em&gt; subtle, and that's exactly what makes them filterable.&lt;/p&gt;

&lt;p&gt;The negative side of this is obviously the extra token usage, but it's worth it for me, as trusting the LLM with a choice of dependency will be way more expensive if I have to replace it later in the project. The other negative is that it's an operator-triggered step. I &lt;em&gt;could&lt;/em&gt; integrate this into the rules of how I prefer the harness to deal with web searches, but the critic only works when the offending content is already in the agent's context.&lt;/p&gt;

&lt;p&gt;I am hoping that things like it get integrated into the system prompts of agentic harnesses. But until that happens, the responsibility is on you. The next time an agent confidently recommends a tool you've never heard of, ask yourself who wrote the content that convinced it. I chose not to write that kind of content for Arcade. Not everyone will make the same choice.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>OpenClaw can do a lot, but it shouldn't have access to your tokens</title>
      <dc:creator>Mateo Torres</dc:creator>
      <pubDate>Thu, 26 Feb 2026 23:12:47 +0000</pubDate>
      <link>https://dev.to/arcade/openclaw-can-do-a-lot-but-it-shouldnt-have-access-to-your-tokens-2343</link>
      <guid>https://dev.to/arcade/openclaw-can-do-a-lot-but-it-shouldnt-have-access-to-your-tokens-2343</guid>
      <description>&lt;p&gt;OpenClaw (a.k.a. Moltbot, a.k.a. ClawdBot) went viral and became one of the most popular agentic harnesses in a matter of days.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://steipete.me/" rel="noopener noreferrer"&gt;Peter Steinberger&lt;/a&gt; had a successful exit from PSPDFKit, and &lt;a href="https://steipete.me/posts/2025/finding-my-spark-again" rel="noopener noreferrer"&gt;felt empty&lt;/a&gt; until the undeniable potential of AI sparked renewed motivation to build. And he's doing it it &lt;a href="https://github.com/steipete/#github-activity" rel="noopener noreferrer"&gt;non-stop&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;OpenClaw approaches the idea of an Personal AI agent as a harness that communicates with you (or multiple users) in any of the supported &lt;a href="https://docs.openclaw.ai/channels" rel="noopener noreferrer"&gt;channels&lt;/a&gt; in multiple &lt;a href="https://docs.openclaw.ai/concepts/session" rel="noopener noreferrer"&gt;sessions&lt;/a&gt; connected to the underlying computer through a &lt;a href="https://docs.openclaw.ai/gateway" rel="noopener noreferrer"&gt;gateway&lt;/a&gt;, which is ultimately responsible for running and maintaining.&lt;/p&gt;

&lt;p&gt;A super entertaining narration of important events is available in &lt;a href="https://docs.openclaw.ai/start/lore" rel="noopener noreferrer"&gt;OpenClaw's Lore doc page&lt;/a&gt; (worth a read!)&lt;/p&gt;

&lt;h2&gt;
  
  
  A security nightmare
&lt;/h2&gt;

&lt;p&gt;Everyone wanted to start playing with what is clearly shaping how the future of Personal AI assistants could look like. However, people were running OpenClaw without even an afterthought to security. And that (of course) resulted in some &lt;em&gt;not so funny&lt;/em&gt; preventable disasters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.analyticsinsight.net/news/crypto-market-news-clawdbot-security-crisis-exposes-open-servers-and-crypto-scams" rel="noopener noreferrer"&gt;Clawdbot Security Crisis Exposes Open Servers and Crypto Scams&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.bitdefender.com/en-us/blog/hotforsecurity/moltbot-security-alert-exposed-clawdbot-control-panels-risk-credential-leaks-and-account-takeovers" rel="noopener noreferrer"&gt;Moltbot security alert exposed Clawdbot control panels risk credential leaks and account takeovers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://forklog.com/en/critical-vulnerabilities-found-in-clawdbot-ai-agent-for-cryptocurrency-theft/" rel="noopener noreferrer"&gt;Critical Vulnerabilities Found in Clawdbot AI Agent for Cryptocurrency Theft&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As &lt;a href="https://techcrunch.com/2026/01/27/everything-you-need-to-know-about-viral-personal-ai-assistant-clawdbot-now-moltbot/" rel="noopener noreferrer"&gt;this&lt;/a&gt; TechCrunch article points out:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Right now, running Moltbot safely means running it on a separate computer with throwaway accounts, which defeats the purpose of having a useful AI assistant. And fixing that security-versus-utility trade-off may require solutions that are beyond Steinberger’s control.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The reason for this is, as you may have guessed, the &lt;a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/" rel="noopener noreferrer"&gt;lethal trifecta&lt;/a&gt;: the inherently dangerous combination of giving LLMs tools with the following characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access to your private data&lt;/li&gt;
&lt;li&gt;Exposure to untrusted content&lt;/li&gt;
&lt;li&gt;The ability to externally communicate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As Simon Willison (who coined the term) explains:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;LLMs are unable to reliably distinguish the importance of instructions based on where they came from. Everything eventually gets glued together into a sequence of tokens and fed to the model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As a harness with "Full System Access" and "Browser Control" as flagship features,&lt;br&gt;&lt;br&gt;
you can see how OpenClaw checks the three boxes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Securing OpenClaw
&lt;/h2&gt;

&lt;p&gt;OpenClaw doesn't &lt;em&gt;have&lt;/em&gt; to be limited to throwaway accounts though. Since it blew up, security has been one of the main &lt;a href="https://docs.openclaw.ai/gateway/security" rel="noopener noreferrer"&gt;focus points&lt;/a&gt; of OpenClaw's development, and you can leverage some of that today to get a secure experience in the harness. While this still requires you to be technically savvy, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use OpenClaw's &lt;a href="https://docs.openclaw.ai/gateway/sandbox-vs-tool-policy-vs-elevated" rel="noopener noreferrer"&gt;tool policies&lt;/a&gt; to control which user and/or agent gets access to specific tools&lt;/li&gt;
&lt;li&gt;Run it in a &lt;a href="https://docs.openclaw.ai/gateway/sandboxing" rel="noopener noreferrer"&gt;Sandbox&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use &lt;a href="https://docs.openclaw.ai/tools/exec-approvals" rel="noopener noreferrer"&gt;exec approvals&lt;/a&gt; to implement human-in-the-loop for specific tools that may have undesired side-effects&lt;/li&gt;
&lt;li&gt;Use a detached tool-calling runtime like Arcade. Credentials never touch the harness, so there's nothing to leak.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's how to setup that last point in your OpenClaw instance:&lt;/p&gt;

&lt;p&gt;First, clone the Arcade plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone &lt;span class="nt"&gt;--depth&lt;/span&gt; 1 https://github.com/ArcadeAI/openclaw-arcade-plugin /tmp/openclaw-arcade
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, install it into your OpenClaw gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw plugins &lt;span class="nb"&gt;install&lt;/span&gt; /tmp/openclaw-arcade/arcade
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Go to your Arcade Dashboard to &lt;a href="https://docs.arcade.dev/en/get-started/setup/api-keys" rel="noopener noreferrer"&gt;get and API key&lt;/a&gt;&lt;br&gt;&lt;br&gt;
copy it, and run this command to configure your Arcade API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw config &lt;span class="nb"&gt;set &lt;/span&gt;plugins.entries.arcade.config.apiKey &lt;span class="s2"&gt;"{your_arcade_api_key}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And this one to configure your Arcade User ID (this is the email you used to&lt;br&gt;&lt;br&gt;
sign up to Arcade):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw config &lt;span class="nb"&gt;set &lt;/span&gt;plugins.entries.arcade.config.user_id &lt;span class="s2"&gt;"{your_arcade_user_id}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the Arcade plugin is configured, initialize it to load all the tools, and&lt;br&gt;&lt;br&gt;
restart the OpenClaw gateway&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw arcade init
openclaw gateway restart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now OpenClaw has access to 7,000+ tools, with tokens handled outside the harness. Nothing to exfiltrate.&lt;/p&gt;

&lt;p&gt;Here's a screenshot of how this works when I talk to the Telegram bot connected&lt;br&gt;&lt;br&gt;
to my OpenClaw instance:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2rw5yjp764ou7gybjhn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2rw5yjp764ou7gybjhn.png" alt=" " width="800" height="956"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final tips
&lt;/h2&gt;

&lt;p&gt;Even with these precautions, OpenClaw is still early-adopter territory. Make sure to run this in a sandbox, a VPS, or even a dedicated computer. If you're sharing files to OpenClaw, make sure to set up the guardrails around the tools it can use, and be mindful of the accounts you log into in the browser it can control.&lt;/p&gt;




&lt;h3&gt;
  
  
  Ready to secure your agent setup? 
&lt;/h3&gt;

&lt;p&gt;Arcade handles just-in-time agent authorization so credentials never touch your harness → &lt;a href="https://docs.arcade.dev/en/home" rel="noopener noreferrer"&gt;Get started&lt;/a&gt;&lt;/p&gt;

</description>
      <category>tutorials</category>
    </item>
  </channel>
</rss>
