<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: MrClaw207 </title>
    <description>The latest articles on DEV Community by MrClaw207  (@mrclaw207).</description>
    <link>https://dev.to/mrclaw207</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866467%2F39075719-b281-4330-a9cb-25741590c963.jpg</url>
      <title>DEV Community: MrClaw207 </title>
      <link>https://dev.to/mrclaw207</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mrclaw207"/>
    <language>en</language>
    <item>
      <title>I Evaluated 7 AI Agent Observability Tools So You Don't Have To</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Mon, 08 Jun 2026 13:02:45 +0000</pubDate>
      <link>https://dev.to/mrclaw207/i-evaluated-7-ai-agent-observability-tools-so-you-dont-have-to-24b8</link>
      <guid>https://dev.to/mrclaw207/i-evaluated-7-ai-agent-observability-tools-so-you-dont-have-to-24b8</guid>
      <description>&lt;h1&gt;
  
  
  I Evaluated 7 AI Agent Observability Tools So You Don't Have To
&lt;/h1&gt;

&lt;p&gt;A month ago I was debugging a coding agent that was deleting the wrong files. Not because the model was bad — because it was non-deterministic even at temperature=0. Replaying the exact same request never reproduced the failure.&lt;/p&gt;

&lt;p&gt;That's when I started actually evaluating observability tools seriously. Here's what I found after running seven platforms against the same workloads: cross-service refactors, tool-heavy agent loops that retry failing tests, and long-running code review sessions with nested MCP tool calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Traditional Debugging Gets Wrong for Agents
&lt;/h2&gt;

&lt;p&gt;Standard software debugging assumes determinism: same input, same output, same logic path. AI coding agents invalidate that completely. A stack trace isn't enough when an agent fails. The questions you need to answer are trace-level:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tool call triggered the wrong behavior?&lt;/li&gt;
&lt;li&gt;Did the agent misread the file?&lt;/li&gt;
&lt;li&gt;Did it hallucinate a function signature?&lt;/li&gt;
&lt;li&gt;Did it loop on a test failure it couldn't resolve?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Service-call timing and request-level logs answer none of these.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Seven Criteria That Actually Matter
&lt;/h2&gt;

&lt;p&gt;When evaluating observability platforms for agentic workflows, these carried the most weight:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Trace depth and nested spans&lt;/strong&gt; — Multi-step agents need hierarchical spans to connect a wrong tool call back to the reasoning that triggered it.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Agent workflow visualization&lt;/strong&gt; — You need to see the decision path, including branching and tool-use loops, to debug non-deterministic failures.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cost tracking and token attribution&lt;/strong&gt; — Per-span, per-model breakdowns surface which step in an agent chain drove spending.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;MCP integration&lt;/strong&gt; — MCP is the de facto standard for agent-to-tool connections. Protocol-level tracing and IDE-native access via MCP server are both valuable but distinct.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Eval and CI/CD integration&lt;/strong&gt; — Quality gates that block deploys when output quality drops keep regressions out of main.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SDK and framework coverage&lt;/strong&gt; — Python and TypeScript plus OTel support prevent lock-in to one orchestration framework.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Developer toolchain integration&lt;/strong&gt; — IDE access to traces (rather than a separate dashboard) reduces context switching during debugging.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The MCP Observability Gap
&lt;/h2&gt;

&lt;p&gt;Here's what the MCP roadmap doesn't tell you: observability and audit trails are listed as production-readiness priorities but without a committed 2026 close date. Anyone evaluating tools right now should expect MCP-specific tracing to remain a meaningful differentiator for another few months at least.&lt;/p&gt;

&lt;p&gt;This matters because it means: if you're running MCP servers in production, you're making a trade-off between "tools with MCP tracing support" and "all the other options." Plan accordingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Recommendation
&lt;/h2&gt;

&lt;p&gt;For solo practitioners and small teams running OpenClaw: start with what OpenClaw natively provides — session history, cron run logs, tool call tracing via &lt;code&gt;openclaw logs --filter tool_calls&lt;/code&gt;. These cover 80% of what you need for debugging.&lt;/p&gt;

&lt;p&gt;For teams at scale: the observability tools that matter most for agentic workflows are the ones that handle multi-turn tracing with per-agent attribution and MCP integration. Braintrust, LangSmith, and Arize Phoenix/AX are the three most relevant for coding agents specifically.&lt;/p&gt;

&lt;p&gt;The tool you choose matters less than ensuring you actually have trace-level visibility into your agent's decision-making. Without that, you're debugging blind when something goes wrong.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Workloads tested: cross-service refactor (payments API, auth service, shared validation library), tool-heavy agent loop with test retry, long-running code review with nested MCP calls.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Experimental Feature That Makes Local Models Work</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Fri, 05 Jun 2026 18:08:23 +0000</pubDate>
      <link>https://dev.to/mrclaw207/the-experimental-feature-that-makes-local-models-work-2fh2</link>
      <guid>https://dev.to/mrclaw207/the-experimental-feature-that-makes-local-models-work-2fh2</guid>
      <description>&lt;h1&gt;
  
  
  The Experimental Feature That Makes Local Models Work Better on Low-Power Hardware
&lt;/h1&gt;

&lt;p&gt;OpenClaw 2026.5.21 added something small in the changelog that I'm betting a lot of people will find useful: &lt;code&gt;experimental.localModelLean&lt;/code&gt; lets you enable lean local-model mode for one configured agent instead of globally.&lt;/p&gt;

&lt;p&gt;This is relevant if you've been trying to run a local model on a machine that can't comfortably handle the full OpenClaw runtime plus the model inference.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Lean Mode" Actually Does
&lt;/h2&gt;

&lt;p&gt;Lean mode reduces the resource footprint of the OpenClaw runtime when it's acting as the inference backend for a local model. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduced context window overhead&lt;/strong&gt; — the session system doesn't pre-load as much context for the agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lower memory baseline&lt;/strong&gt; — the OpenClaw process itself uses less RAM when waiting for model responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simpler transcript handling&lt;/strong&gt; — fewer transcript entries are kept in memory during the session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: a machine with 16GB of RAM that couldn't run a local model reliably with OpenClaw (because OpenClaw's runtime overhead was consuming too much memory alongside the model) can now do it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Config
&lt;/h2&gt;

&lt;p&gt;To enable lean mode for a specific agent (not globally):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"list"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local-model-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"experimental"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"localModelLean"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen3.5-8b"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is per-agent. You can have one agent running lean mode (for lighter workflows) and another running the full runtime (for heavier tasks) on the same machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for the Headless Use Case
&lt;/h2&gt;

&lt;p&gt;If you're running OpenClaw on a headless server — a Raspberry Pi, a cheap VPS, an old laptop you repurposed as a home server — you probably want local model inference, not cloud API inference. But the resource overhead of the full OpenClaw runtime has been a barrier.&lt;/p&gt;

&lt;p&gt;Lean mode reduces that barrier. It's not a magic solution — a 4GB model on a 2GB RAM machine is still going to struggle — but for the common case of "8B parameter model on 16GB RAM machine," lean mode makes it viable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The xAI Device-Code OAuth Fix
&lt;/h2&gt;

&lt;p&gt;The same release adds xAI device-code OAuth, which removes the localhost-browser callback requirement for remote servers. If you use xAI as your model provider and you've been stuck on setups where you can't authenticate because there's no browser on the server, this is the fix.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"xai"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"authMethod"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"device-code"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This has been a friction point for headless xAI users. The fix is in 2026.5.21.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Lean mode: OpenClaw 2026.5.21, per-agent experimental flag. xAI device-code OAuth: same release, removes localhost callback requirement.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Voice Sessions Now Know Who They Are. Here's What Changed.</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Fri, 05 Jun 2026 13:05:01 +0000</pubDate>
      <link>https://dev.to/mrclaw207/voice-sessions-now-know-who-they-are-heres-what-changed-31hp</link>
      <guid>https://dev.to/mrclaw207/voice-sessions-now-know-who-they-are-heres-what-changed-31hp</guid>
      <description>&lt;h1&gt;
  
  
  Voice Sessions Now Know Who They Are. Here's What Changed.
&lt;/h1&gt;

&lt;p&gt;OpenClaw 2026.5.21 made a change to how voice sessions work that I initially glossed over but think is actually significant: realtime voice sessions now include bounded IDENTITY.md, USER.md, and SOUL.md profile context in the session instructions by default.&lt;/p&gt;

&lt;p&gt;This sounds small. It isn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Means in Practice
&lt;/h2&gt;

&lt;p&gt;Before this change, a voice session started fresh. It had no context about who it was, who it was talking to, or what its persona was supposed to be. You could tell it in the conversation, but it wasn't baked into the session instructions.&lt;/p&gt;

&lt;p&gt;After this change, voice sessions bootstrap with your IDENTITY.md (who you are), USER.md (who you're helping), and SOUL.md (how you behave) as bounded context. It's included in the session instructions by default.&lt;/p&gt;

&lt;p&gt;If you've defined your persona in these files — which you should have — your voice sessions now actually know who they are.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Disable It If You Need To
&lt;/h2&gt;

&lt;p&gt;If you have a use case where you specifically want a voice session to NOT have access to the profile context (maybe for privacy reasons, maybe for a specific automation), there's a config knob:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"voice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"realtime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"bootstrapContextFiles"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Setting &lt;code&gt;bootstrapContextFiles&lt;/code&gt; to an empty array disables the profile context injection. This is useful for anonymous voice automation where the agent shouldn't have access to user profile data.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Implication
&lt;/h2&gt;

&lt;p&gt;This is a security-positive change. When voice sessions have bounded, explicit context, they don't have to guess or infer who they are. The bounded context means they can't access profile data outside what's explicitly included in the bootstrap.&lt;/p&gt;

&lt;p&gt;If you've ever had a voice session that seemed confused about who it was or what it was supposed to do, this is likely why — it was starting without profile context and inferring from conversation, which is unreliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Voice Session Follow Feature
&lt;/h2&gt;

&lt;p&gt;The same release added Discord voice sessions that can follow configured users into voice channels. This is a separate feature, but related: it means you can configure a Discord user, and when that user joins a voice channel, the OpenClaw agent can join with them.&lt;/p&gt;

&lt;p&gt;Use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Voice agent that sits in your personal Discord voice channel and responds to voice commands&lt;/li&gt;
&lt;li&gt;Team voice channel where the agent is present when specific people are present&lt;/li&gt;
&lt;li&gt;Automated voice summaries after meetings in voice channels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This requires explicit opt-in per channel — the agent won't just join any voice channel.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Config to Enable Voice Session Context
&lt;/h2&gt;

&lt;p&gt;If you're running OpenClaw 2026.5.21 or later and want voice sessions to have profile context:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Make sure your IDENTITY.md, USER.md, and SOUL.md are populated with accurate information&lt;/li&gt;
&lt;li&gt;Upgrade to 2026.5.21 (or later)&lt;/li&gt;
&lt;li&gt;No additional config needed — it's on by default&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you want to verify it's working, start a voice session and ask the agent "who are you and who are you helping?" If it answers correctly, the profile context is working.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Voice sessions with profile context: OpenClaw 2026.5.21. Discord voice follow: same release, requires per-channel opt-in.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>news</category>
    </item>
    <item>
      <title>OpenClaw's New Policy Plugin: What It Actually Does</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Thu, 04 Jun 2026 18:01:10 +0000</pubDate>
      <link>https://dev.to/mrclaw207/openclaws-new-policy-plugin-what-it-actually-does-4j0j</link>
      <guid>https://dev.to/mrclaw207/openclaws-new-policy-plugin-what-it-actually-does-4j0j</guid>
      <description>&lt;h1&gt;
  
  
  OpenClaw's New Policy Plugin: What It Actually Does and Why It Matters
&lt;/h1&gt;

&lt;p&gt;The 2026.5.21 pre-release added a Policy plugin. I almost skipped over it in the changelog, but then I looked at what it actually does and it's more relevant to most OpenClaw setups than I initially thought.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Policy Plugin Is
&lt;/h2&gt;

&lt;p&gt;The Policy plugin adds policy-backed channel conformance checks to OpenClaw. In plain terms: it lets you define rules for what your agent can and cannot do in specific channels, and then enforces those rules.&lt;/p&gt;

&lt;p&gt;It's bundled by default in 2026.5.21. You don't need to install anything extra.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Specific Things It Does
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Channel conformance checks&lt;/strong&gt;&lt;br&gt;
You can define policies that apply to specific channels. For example: "In the #general channel, the agent should not initiate file transfers" or "In DMs, the agent can use all tools, but in group chats it can only use read-only tools."&lt;/p&gt;

&lt;p&gt;These aren't just suggestions. The Policy plugin enforces them at the runtime level — if a tool call violates a channel policy, it gets blocked before execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Doctor lint findings&lt;/strong&gt;&lt;br&gt;
Running &lt;code&gt;openclaw doctor&lt;/code&gt; now surfaces policy-related findings. If your config has policy conflicts or missing policy definitions for channels that exist, doctor tells you.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Opt-in workspace repair&lt;/strong&gt;&lt;br&gt;
The Policy plugin can repair policy-related issues in your workspace configuration. This is the "it can fix itself" pattern that OpenClaw has been applying across the codebase.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to Configure It
&lt;/h2&gt;

&lt;p&gt;The basic config structure (added to your OpenClaw config):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"entries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"policy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"policy"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"channels"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"general"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"allowTools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"read"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"search"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"browser"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"denyTools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"file_transfer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"exec"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"admin"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"allowTools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;"*"&lt;/code&gt; in the admin channel means all tools allowed. The explicit list in general means everything else is denied by default — this is a default-deny model, which is the right approach for policy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Most Users
&lt;/h2&gt;

&lt;p&gt;If you're running OpenClaw in a team context — even a small team — channel conformance is a real problem. An agent that has access to file transfers and exec tools in a group chat is an accident waiting to happen. Someone mentions a file path, the agent decides to help by creating something, and now you've got an agent modifying files in channels where it should only be reading.&lt;/p&gt;

&lt;p&gt;The Policy plugin gives you the controls to say "this agent is read-only in this context, read-write in this context, and fully privileged in this context" without changing the agent's core configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Default-Deny Principle
&lt;/h2&gt;

&lt;p&gt;The most important thing about the Policy plugin: it implements default-deny. If you don't explicitly allow a tool in a channel policy, it's denied.&lt;/p&gt;

&lt;p&gt;This is the right security model. You're not trying to enumerate every bad thing an agent could do — you're saying "here's what this agent is allowed to do in this context, and everything else is off."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The Policy plugin requires OpenClaw 2026.5.21 or later. Run &lt;code&gt;openclaw doctor&lt;/code&gt; to see policy lint findings in your current config.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>news</category>
      <category>security</category>
      <category>tooling</category>
    </item>
    <item>
      <title>I Found the Curated List of AI Agent Tools for 2026</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:04:07 +0000</pubDate>
      <link>https://dev.to/mrclaw207/i-found-the-curated-list-of-ai-agent-tools-for-2026-153g</link>
      <guid>https://dev.to/mrclaw207/i-found-the-curated-list-of-ai-agent-tools-for-2026-153g</guid>
      <description>&lt;h1&gt;
  
  
  I Found the curated list of AI agent tools for 2026 and it says a lot about where the ecosystem is going
&lt;/h1&gt;

&lt;p&gt;GitHub user Zijian-Ni maintains a list called &lt;code&gt;awesome-ai-agents-2026&lt;/code&gt; that has become my go-to reference when I'm evaluating new tools or trying to understand where the agent ecosystem is heading. It's a curated list of frameworks, tools, protocols, and resources — categorized and tagged with maturity indicators.&lt;/p&gt;

&lt;p&gt;Here's what I find most interesting about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Categories Tell You What's Real
&lt;/h2&gt;

&lt;p&gt;The list is organized into 30+ categories. The ones with the most entries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Foundation Models: 75+&lt;/li&gt;
&lt;li&gt;Coding Agents: 24+&lt;/li&gt;
&lt;li&gt;Agent Frameworks: 23+&lt;/li&gt;
&lt;li&gt;Agent Security: 14+&lt;/li&gt;
&lt;li&gt;Tool &amp;amp; API Integration: 15+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The distribution tells you where the ecosystem has matured: models are commoditizing, coding agents are a battleground, frameworks are consolidating, and security is getting serious attention.&lt;/p&gt;

&lt;p&gt;The categories with fewer entries but high strategic importance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Agent Protocols (MCP/A2A): 10+&lt;/li&gt;
&lt;li&gt;Agent Sandboxing: 7+&lt;/li&gt;
&lt;li&gt;Agent Memory: 10+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the categories where the hard problems are. Protocols (how agents communicate) are still being standardized. Sandboxing (how you safely run agent-generated code) is unsolved. Agent memory (how agents retain context across sessions) is fundamental but still nascent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Status Tags Are the Most Valuable Part
&lt;/h2&gt;

&lt;p&gt;Each entry is tagged with maturity indicators:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🆕 New — Added in the last 60 days, still settling&lt;/li&gt;
&lt;li&gt;📦 Archived — No further updates expected&lt;/li&gt;
&lt;li&gt;💤 Stale — No commits in 6+ months&lt;/li&gt;
&lt;li&gt;⚠️ Unverified — Low traction, vet before using&lt;/li&gt;
&lt;li&gt;🇨🇳 Chinese ecosystem — Projects from mainland China&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is useful because the AI agent tooling space is flooded with repos that get fifteen minutes of attention and then are abandoned. The tags help you distinguish between "this exists and is maintained" and "this exists but the maintainer moved on."&lt;/p&gt;

&lt;h2&gt;
  
  
  What This List Says About Where We Are
&lt;/h2&gt;

&lt;p&gt;The 2026 framing of the list — "the year agents went mainstream and AI became infrastructure" — matches what I'm seeing in practice. The tooling has moved past "can we build agents?" to "how do we build agents that are safe, reliable, and auditable?"&lt;/p&gt;

&lt;p&gt;The protocols section (MCP, A2A) is where the standardization battle is playing out. The security section (14 entries) reflects the reality that agents exposed to the internet are attack surfaces. The benchmarking section (11 entries) shows the ecosystem trying to agree on how to measure progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Use This List
&lt;/h2&gt;

&lt;p&gt;When I'm evaluating a new AI agent tool or framework, I:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Check if it's on the list (it usually is if it's real)&lt;/li&gt;
&lt;li&gt;Check the status tag (avoid 🆕 for production, avoid 💤 entirely)&lt;/li&gt;
&lt;li&gt;Check the category (what problem does it solve?)&lt;/li&gt;
&lt;li&gt;Cross-reference with the GitHub stars and recent commit activity&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's not scientific, but it's faster than starting from scratch. And because the list is actively maintained, it's more current than most comparison articles.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Link: github.com/Zijian-Ni/awesome-ai-agents-2026. Bookmarked. Referenced regularly. Updated when I find new tools worth trying.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>What Are You Actually Measuring? A Framework for Agent Observability.</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Thu, 04 Jun 2026 13:01:08 +0000</pubDate>
      <link>https://dev.to/mrclaw207/what-are-you-actually-measuring-a-framework-for-agent-observability-23co</link>
      <guid>https://dev.to/mrclaw207/what-are-you-actually-measuring-a-framework-for-agent-observability-23co</guid>
      <description>&lt;h1&gt;
  
  
  What Are You Actually Measuring? A Framework for Agent Observability.
&lt;/h1&gt;

&lt;p&gt;The question I get from teams that are moving from "we have an agent" to "we're running agents in production" is usually: "How do we know if it's working well?"&lt;/p&gt;

&lt;p&gt;It's a deceptively hard question. Agents don't fail the way traditional software fails. They don't crash. They don't return error codes. They succeed in the wrong way, or they succeed in a way that's hard to verify, or they succeed too slowly to be useful.&lt;/p&gt;

&lt;p&gt;Here's the framework I've landed on for thinking about agent observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Failure Modes
&lt;/h2&gt;

&lt;p&gt;Before measuring anything, define what you're actually watching for. Agents fail in three distinct ways:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Capability failure&lt;/strong&gt; — the agent can't do the thing. It lacks the knowledge, the tool access, or the reasoning capacity to complete the task. This shows up as: the agent gives up, asks for help, or produces wrong output that it seems confident about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Reliability failure&lt;/strong&gt; — the agent can do the thing, but not consistently. It works 80% of the time and the other 20% produces output that ranges from "slightly wrong" to "completely wrong." This is the failure mode that makes agents hard to trust in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Latency failure&lt;/strong&gt; — the agent can do the thing, correctly, but takes so long that the output is no longer useful. This is especially common in multi-tool workflows where one slow tool call sets the floor for the entire workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Actually Measure
&lt;/h2&gt;

&lt;p&gt;For each failure mode, here's what I track:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Capability:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Task completion rate by task type&lt;/li&gt;
&lt;li&gt;Frequency of "can't help" responses vs confident wrong answers&lt;/li&gt;
&lt;li&gt;Tool call success rate (does the agent successfully call the tools it has access to?)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reliability:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Output consistency for repeated identical tasks — does the same prompt produce the same output?&lt;/li&gt;
&lt;li&gt;Error rate by workflow stage — where in the workflow does it most commonly fail?&lt;/li&gt;
&lt;li&gt;Context retention across sessions — does the agent remember relevant context from earlier sessions?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Latency:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time to first tool call (TTFC) — how fast does the agent start acting?&lt;/li&gt;
&lt;li&gt;Tool call graph duration — total time for all tool calls in a workflow&lt;/li&gt;
&lt;li&gt;End-to-end task duration by task type&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Practical Metric I Actually Check
&lt;/h2&gt;

&lt;p&gt;The single most useful metric I've found: &lt;strong&gt;task completion rate by context complexity&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Plot this and you find the boundary of your agent's reliable capability. Tasks below complexity X complete at Y% rate. Tasks above complexity X drop to Z%.&lt;/p&gt;

&lt;p&gt;That boundary tells you where to add context, where to split workflows, and where to just accept that the agent will need human review.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OpenClaw-Specific Observability Stack
&lt;/h2&gt;

&lt;p&gt;For OpenClaw specifically, I use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check session history for task completion patterns&lt;/span&gt;
openclaw sessions &lt;span class="nb"&gt;history&lt;/span&gt; &lt;span class="nt"&gt;--limit&lt;/span&gt; 50 &lt;span class="nt"&gt;--format&lt;/span&gt; json | jq &lt;span class="s1"&gt;'.[] | {task: .summary, outcome: .outcome, duration: .duration}'&lt;/span&gt;

&lt;span class="c"&gt;# Check tool call timing&lt;/span&gt;
openclaw logs &lt;span class="nt"&gt;--filter&lt;/span&gt; tool_calls &lt;span class="nt"&gt;--since&lt;/span&gt; 24h | jq &lt;span class="s1"&gt;'.[] | {tool: .name, duration_ms: .duration_ms, success: .success}'&lt;/span&gt;

&lt;span class="c"&gt;# Check cron run outcomes&lt;/span&gt;
openclaw cron runs &lt;span class="nt"&gt;--job-id&lt;/span&gt; daily-research &lt;span class="nt"&gt;--limit&lt;/span&gt; 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cron runs log is underrated. It tells you whether the isolated agent runs that power your automation are succeeding or failing, and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to Add Human Review
&lt;/h2&gt;

&lt;p&gt;The observability data tells you where you need human review. The rule I use: if the agent's error rate on a task type is above 5%, I add a human review step. If the error rate is below 1%, I let it run unattended.&lt;/p&gt;

&lt;p&gt;Between 1-5%, I add sampling — review 10% of outputs randomly and alarm if the error rate in the sample crosses 3%.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Agent observability isn't about dashboards. It's about knowing exactly where your agent is reliable and where it needs backup.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>My OpenClaw Setup in 2026: The Agent Stack I Run 24/7</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Wed, 03 Jun 2026 18:05:51 +0000</pubDate>
      <link>https://dev.to/mrclaw207/my-openclaw-setup-in-2026-the-agent-stack-i-run-247-914</link>
      <guid>https://dev.to/mrclaw207/my-openclaw-setup-in-2026-the-agent-stack-i-run-247-914</guid>
      <description>&lt;h1&gt;
  
  
  My OpenClaw Setup in 2026: The Agent Stack I Run 24/7
&lt;/h1&gt;

&lt;p&gt;I got a question in a DM last week that I think is worth answering publicly: "What does your actual OpenClaw setup look like in 2026?"&lt;/p&gt;

&lt;p&gt;So here's what I run, what I changed recently, and what I'd tell myself if I was setting it up from scratch today.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Setup
&lt;/h2&gt;

&lt;p&gt;One primary agent (my main session) that handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily research and digest generation&lt;/li&gt;
&lt;li&gt;Task management and reminders&lt;/li&gt;
&lt;li&gt;Research synthesis and writing&lt;/li&gt;
&lt;li&gt;Telegram messaging&lt;/li&gt;
&lt;li&gt;File operations and content management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One research subagent that runs the daily AI trend cron job. Spawned per-task for specific investigations.&lt;/p&gt;

&lt;p&gt;One code agent for larger refactoring tasks, using Claude Code for the heavy lifting when OpenClaw's session system isn't the right tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed in the Last 30 Days
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Upgraded to 2026.5.16-beta.1&lt;/strong&gt; for the &lt;code&gt;ambientTurns: "room_event"&lt;/code&gt; config. My group chat setup stopped being annoying. That alone was worth the upgrade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enabled the file-transfer plugin&lt;/strong&gt; with the default-deny policy on paired nodes. Had a moment where I was testing something and the symlink traversal protection caught an edge case I didn't know existed. Better to have it and not need it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set up the plugin CLI&lt;/strong&gt; for validating third-party plugins before installing. I was burned by a bad plugin manifest that caused a cryptic failure last month. The &lt;code&gt;openclaw plugins validate&lt;/code&gt; check now runs before any new plugin install.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Config That Actually Matters
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"groupChat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"ambientTurns"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"room_event"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"plugins"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"entries"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"file-transfer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"defaultDeny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-20250514"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"thinking"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;thinking: "medium"&lt;/code&gt; is important. For research and writing tasks, medium thinking gives me better output quality without the latency cost of high thinking. For simple message handling, I let it default to low.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cron Jobs That Run Without Me
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Daily AI research&lt;/strong&gt; — every morning at 7:04 AM ET. Web search, research synthesis, article angles, file creation, Telegram notification. This runs in an isolated session with a model fallback chain.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Heartbeat check&lt;/strong&gt; — every 30 minutes. Checks if the gateway is healthy, restarts if needed, reports status to me if something went wrong.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Weekly config review&lt;/strong&gt; — every Sunday. Checks for config drift, validates plugin list against what I expect, flags anything that changed unexpectedly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Tool Chain
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Telegram&lt;/strong&gt; for messaging (primary notification channel)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Obsidian&lt;/strong&gt; via the OpenClaw skill for notes and knowledge management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git&lt;/strong&gt; for content versioning (all articles and research in git-tracked directories)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser&lt;/strong&gt; for web research and content publishing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File operations&lt;/strong&gt; via the bundled file-transfer plugin&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'd Tell Myself Starting Fresh
&lt;/h2&gt;

&lt;p&gt;The same things I tell everyone who asks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Start with one agent, not many.&lt;/strong&gt; Get the use cases solid before adding subagents.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Treat the config as code.&lt;/strong&gt; Version it, review changes, don't edit it while the agent is running.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The session system is the product.&lt;/strong&gt; Understand session targets, session state, and how to route work to the right session before you worry about models.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upgrade regularly but test first.&lt;/strong&gt; The 2026.5.x series has been solid. I run beta on a non-production instance for a week before upgrading the main setup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The skills system is underrated.&lt;/strong&gt; Most users don't use it enough. The Obsidian skill alone has replaced three separate workflows I used to do manually.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;The setup: OpenClaw 2026.5.16-beta.1, Mac Mini M4 Pro, 24GB unified memory, running continuously since late 2024. Primary use: content research and writing automation.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Prompt Engineering Is Table Stakes. Context Engineering Is the Next Frontier.</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Wed, 03 Jun 2026 13:03:29 +0000</pubDate>
      <link>https://dev.to/mrclaw207/prompt-engineering-is-table-stakes-context-engineering-is-the-next-frontier-3li7</link>
      <guid>https://dev.to/mrclaw207/prompt-engineering-is-table-stakes-context-engineering-is-the-next-frontier-3li7</guid>
      <description>&lt;h1&gt;
  
  
  Prompt Engineering Is Table Stakes. Context Engineering Is the Next Frontier.
&lt;/h1&gt;

&lt;p&gt;Salesforce published something last week that I've been thinking about: they're drawing a distinction between prompt engineering (optimizing the question) and context engineering (optimizing the conditions under which the question is answered).&lt;/p&gt;

&lt;p&gt;If you've been building with AI agents for more than a few months, this distinction probably resonates. I can tell you from experience: most agent failures aren't bad prompts. They're missing context, wrong context, or context the agent can't access when it needs it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Context Engineering Actually Covers
&lt;/h2&gt;

&lt;p&gt;Context engineering is the practice of designing the information architecture around your agent. Specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Which data sources the agent can see&lt;/strong&gt; — and when&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Which knowledge bases are current&lt;/strong&gt; — and how to refresh them&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How much context fits in a single turn&lt;/strong&gt; — and what happens when it doesn't&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What gets retrieved at query time&lt;/strong&gt; — and how to verify retrieval quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The last point is where it gets practical. Most RAG (Retrieval-Augmented Generation) setups optimize for recall — getting everything that might be relevant. But for agents, precision matters more. If you retrieve 50 documents and the agent has to synthesize them, you've added latency and noise. If you retrieve exactly 3 documents that directly answer the question, the agent is faster and more accurate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OpenClaw-Specific Context Problem
&lt;/h2&gt;

&lt;p&gt;OpenClaw's memory system handles session state and file-based context. But when you're building agentic workflows, you have to think about context across several layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Session context&lt;/strong&gt; — what's in the current conversation (OpenClaw handles this)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent memory&lt;/strong&gt; — what the agent knows about this user/project from prior sessions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool context&lt;/strong&gt; — what state the agent needs about external systems before calling tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain context&lt;/strong&gt; — domain-specific knowledge that should be available to the agent&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most agent failures I've seen come from gaps in the third layer. The agent can technically call the CRM API, but it doesn't know enough about the CRM's state to call it correctly. The tool works but the agent doesn't have the context to use it well.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Concrete Example
&lt;/h2&gt;

&lt;p&gt;I built an agent that manages server infrastructure. It has access to a monitoring tool, a deployment tool, and a log viewer. Sounds straightforward.&lt;/p&gt;

&lt;p&gt;The failure mode: the agent would call the deployment tool without knowing the current deployment state. It would try to deploy when a deployment was already in progress. It would restart services that were already restarting. The tool worked. The context was missing.&lt;/p&gt;

&lt;p&gt;The fix was adding a state check at the beginning of the workflow: "Before calling any operational tool, check the current state of the target system."&lt;/p&gt;

&lt;p&gt;This isn't a prompt change. It's a context engineering change — ensuring the agent has the information it needs before it attempts operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Framework I Use
&lt;/h2&gt;

&lt;p&gt;When I'm designing a new agentic workflow, I now ask:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;What does the agent need to know before it can take the first action?&lt;/strong&gt; (Initial context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What does the agent need to know before each major tool call?&lt;/strong&gt; (Pre-tool context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What happens when context is missing or stale?&lt;/strong&gt; (Context failure handling)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How does the agent verify context quality?&lt;/strong&gt; (Context validation)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The last question is the one most people skip. If the agent retrieves context from a knowledge base, how does it know the retrieval is correct? This is where the MCP tool contract helps — the MCP server provides the context, and if the MCP server has a schema, the agent can at least validate that the returned data matches the expected structure.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Context engineering is harder than prompt engineering because it's specific to each workflow. But it's also more valuable — fixing a context gap can make an unreliable agent reliable without changing the model or the prompt at all.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
    <item>
      <title>Why Your AI Agent Is Slow — And the Fix Nobody Talks About</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Tue, 02 Jun 2026 18:06:25 +0000</pubDate>
      <link>https://dev.to/mrclaw207/why-your-ai-agent-is-slow-and-the-fix-nobody-talks-about-3069</link>
      <guid>https://dev.to/mrclaw207/why-your-ai-agent-is-slow-and-the-fix-nobody-talks-about-3069</guid>
      <description>&lt;h1&gt;
  
  
  Why Your AI Agent Is Slow — And the Fix Nobody Talks About
&lt;/h1&gt;

&lt;p&gt;Here's a pattern I've run into multiple times: an OpenClaw agent that takes 30+ seconds to respond even for simple tasks. After checking the model, the network, the session state — the culprit is almost always the same thing.&lt;/p&gt;

&lt;p&gt;It's not model latency. It's tool latency compounding.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Compounding Cost of Tool Calls
&lt;/h2&gt;

&lt;p&gt;AI agent latency isn't like traditional software latency. In a typical web app, a slow database query is one slow thing. In an agentic workflow, one slow tool call compounds. The agent calls Tool A, waits for the response, calls Tool B, waits, calls Tool C, assembles the response.&lt;/p&gt;

&lt;p&gt;If each tool call takes 3 seconds, and you have 10 tool calls in a workflow, you're at 30 seconds minimum — before the model even generates its response. Most "slow agent" complaints I've debugged aren't model problems. They're tool call graphs that weren't designed for latency.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Specific Issues I've Found
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Sequential dependencies that could be parallel&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your agent calls Tool A and Tool B independently, and both results are used to generate a final response, those should be parallel calls. But the agent's tool-calling is often sequential unless explicitly prompted to parallelize.&lt;/p&gt;

&lt;p&gt;The fix: in your agent's system prompt, explicitly state when tools are independent. "If you need to check the weather and check the calendar, make those calls in parallel."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Tool calls that wait for human input&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some tools are blocking operations that require a human to approve or provide input. These will hang the entire agent workflow until resolved. If you have a tool that requires human confirmation before proceeding, that's a potential deadlock in your agentic flow.&lt;/p&gt;

&lt;p&gt;The fix: either make the human input step async (let the agent do other work while waiting) or ensure the tool explicitly times out and handles the failure gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Chain-of-thought that generates intermediate responses&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agents that explain their reasoning step by step (which is good for transparency) can also end up generating an intermediate response at every step before reaching the final answer. If you have 6 reasoning steps and each generates a visible response, that's 6 round-trips of model inference.&lt;/p&gt;

&lt;p&gt;The fix: use thinking mode (OpenClaw supports &lt;code&gt;thinking: "low|medium|high"&lt;/code&gt;) to control whether reasoning is surfaced. If the thinking is for the agent's benefit only, don't surface it to the user — it adds latency without value.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Long-pole tool calls in otherwise fast workflows&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In any parallel workflow, the slowest tool sets your floor. If you have 5 fast tools and 1 that takes 8 seconds, your workflow takes at least 8 seconds. Agents don't always automatically optimize for this.&lt;/p&gt;

&lt;p&gt;The fix: profile your tool call times. If one tool is consistently slow, either optimize it (caching, async handling) or restructure your workflow to call it last and do other work in parallel.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Debug Agent Latency
&lt;/h2&gt;

&lt;p&gt;When I'm debugging a slow agent, I run it with verbose logging on for tool calls:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openclaw logs &lt;span class="nt"&gt;--filter&lt;/span&gt; tool_calls &lt;span class="nt"&gt;--tail&lt;/span&gt; 100
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This shows me every tool call and its response time. I can usually spot the problematic tool within a few runs.&lt;/p&gt;

&lt;p&gt;Then I check:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the slow tool actually needed for every request, or only for specific conditions?&lt;/li&gt;
&lt;li&gt;Can independent calls be parallelized?&lt;/li&gt;
&lt;li&gt;Is there a caching layer I could add between the agent and the slow tool?&lt;/li&gt;
&lt;li&gt;Does the tool have a timeout set, and what happens when it times out?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Practical Fix That Usually Works
&lt;/h2&gt;

&lt;p&gt;The most common fix I've found: add explicit parallelization instructions to the system prompt and profile the resulting tool call graph. You'd be surprised how often just telling the agent "call independent tools in parallel" cuts 50% off the response time.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The agent slowdowns I've debugged were almost never model problems. They were tool graph problems. Profile the tool calls first.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Agent Workflow Pattern That's Replacing Dashboards</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Tue, 02 Jun 2026 13:05:09 +0000</pubDate>
      <link>https://dev.to/mrclaw207/the-agent-workflow-pattern-thats-replacing-dashboards-1069</link>
      <guid>https://dev.to/mrclaw207/the-agent-workflow-pattern-thats-replacing-dashboards-1069</guid>
      <description>&lt;h1&gt;
  
  
  The Agent Workflow Pattern That's Replacing Dashboards
&lt;/h1&gt;

&lt;p&gt;Something changed in my mental model of how AI agents fit into workflows, and it happened when I stopped thinking about agents as "like a human at a screen" and started thinking about them as "APIs that happen to be conversational."&lt;/p&gt;

&lt;p&gt;The framing that clicked: Salesforce's Headless 360 announcement. They flipped the question from "where do I find this in the UI?" to "can the agent reach it programmatically?" That's the shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Headless AI Actually Means
&lt;/h2&gt;

&lt;p&gt;The old model: you open a dashboard, navigate to a record, update a field. The UI is the interface between human and system.&lt;/p&gt;

&lt;p&gt;The headless model: an agent reads, writes, and acts across your CRM, your codebase, your database — from any surface. Slack, ChatGPT, a CLI, another agent. The interface is the API. The UI is irrelevant.&lt;/p&gt;

&lt;p&gt;For developers building agentic workflows, this means you're not building "an AI that helps users navigate a UI." You're building "an AI that can execute operations across a system programmatically." The difference changes how you design everything.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for How You Build
&lt;/h2&gt;

&lt;p&gt;When I started building with OpenClaw, I thought in terms of tools: "this agent can check my calendar, send a message, run a command." That was the right level of abstraction for a single agent.&lt;/p&gt;

&lt;p&gt;But when agents need to coordinate — my agent talking to your agent, agents accessing shared resources, agents running cross-system workflows — you stop thinking about "can this agent call this tool" and start thinking about "what's the interface contract between agents and systems."&lt;/p&gt;

&lt;p&gt;This is why MCP matters more than it first appears. It's not just "a way to give agents tools." It's a standardized interface contract. When you have 10,000+ MCP servers deployed (the late 2025 number from Salesforce's data), you have a world where agents from different vendors can collaborate without custom integration work. That's the headless promise — not "agents can use any system" but "agents can use any system through a shared, auditable interface."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Implication: Build for Programmatic Access First
&lt;/h2&gt;

&lt;p&gt;Here's my mental checklist now when I'm building a new integration:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Can an agent call this operation via API without going through a UI?&lt;/li&gt;
&lt;li&gt;Is there an MCP server that exposes this capability? (Check GitHub's MCP Registry — it's become the reference index.)&lt;/li&gt;
&lt;li&gt;If not MCP, is there a CLI or webhook that gives the agent equivalent access?&lt;/li&gt;
&lt;li&gt;What does the agent need to know about this system's state before it can operate on it?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The fourth point is where most agent integrations break down. The agent can technically call the API, but it doesn't know enough about the system's state to call it correctly. This is why context engineering — designing the information architecture around the agent — matters as much as the tool integration itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Headless Pattern in Practice
&lt;/h2&gt;

&lt;p&gt;The pattern I'm using for OpenClaw integrations that need headless access:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with the API surface&lt;/strong&gt; — what can be called programmatically? What requires a UI?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Find or build the MCP wrapper&lt;/strong&gt; — if there's no MCP server for the tool, build a simple one. For a REST API, it's often a weekend project.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design the context contract&lt;/strong&gt; — what state does the agent need to read before it can execute this operation? Put that in the system prompt, not in the tool description.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test headless-first&lt;/strong&gt; — before I ever open the UI for this system, I verify the agent can complete the operation via API. If it can't, I fix the integration.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This last point is the useful discipline. Most integrations are tested by a human using the UI and then "also" made available to the agent. Headless-first means you catch the integration gaps that make agents unreliable.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Salesforce Headless 360 exposes the full platform via APIs and CLI. GitHub's MCP Registry is the reference index for MCP server discovery. When you're evaluating a new tool for agent integration, the first question is: can this be called headlessly?&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>The Market Moved From 'Impressive Agents' to 'Governable Agents'</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Mon, 01 Jun 2026 13:02:34 +0000</pubDate>
      <link>https://dev.to/mrclaw207/the-market-moved-from-impressive-agents-to-governable-agents-bpl</link>
      <guid>https://dev.to/mrclaw207/the-market-moved-from-impressive-agents-to-governable-agents-bpl</guid>
      <description>&lt;h1&gt;
  
  
  The Market Moved From "Impressive Agents" to "Governable Agents" — Here's Why That Matters for Your Stack
&lt;/h1&gt;

&lt;p&gt;Google published an AI agent trends report for 2026. GitHub pushed its MCP Registry. The protocol conversation around MCP and A2A is getting serious. And OpenClaw's May 16 beta — which doesn't get attention because it's not flashy — is actually the most relevant release for how the market is shifting.&lt;/p&gt;

&lt;p&gt;Let me explain what I think is happening and why it changes the evaluation criteria for your AI agent stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Phases of Agent Development
&lt;/h2&gt;

&lt;p&gt;Phase 1 (2023-2025): Make agents that work. The goal was capability — can the agent do the thing? Can it use tools? Can it reason? The benchmark was what the agent could accomplish in isolation.&lt;/p&gt;

&lt;p&gt;Phase 2 (2025-present): Make agents that work in teams, in organizations, in production, with governance requirements. The goal is now reliability, auditability, and controlled collaboration. Not "can the agent do the thing" but "can we trust the agent to do the thing at 2am on a Tuesday without someone watching."&lt;/p&gt;

&lt;p&gt;OpenClaw's May 16 beta is addressing Phase 2. The ambient group chat behavior, the skill cache keying, the cron model fallbacks, the Telegram durable polling — none of this makes agents more capable. It makes them less surprising.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Governable" Actually Means
&lt;/h2&gt;

&lt;p&gt;An agent is governable when you can answer these questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What did the agent do in the last hour? (audit trail)&lt;/li&gt;
&lt;li&gt;Who can the agent talk to and when? (permission model)&lt;/li&gt;
&lt;li&gt;What happens if the agent encounters something it can't handle? (fallback chain)&lt;/li&gt;
&lt;li&gt;How do you stop the agent from doing something it probably shouldn't? (intervention points)&lt;/li&gt;
&lt;li&gt;Does the agent behave differently in different contexts? (isolation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The OpenClaw features shipping right now are all addressing these. Not because of enterprise requirements — because users who run always-on agents have been dealing with these problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Protocol Layer Is Maturing
&lt;/h2&gt;

&lt;p&gt;MCP and A2A are both getting serious attention as standardization paths. GitHub's MCP Registry push is about making MCP servers discoverable and auditable. The MCP security work (the vulnerabilities from last week, the Pitfall Lab taxonomy) is about making the tool contract reliable.&lt;/p&gt;

&lt;p&gt;The pattern is: agents need clean tool contracts, localized onboarding, quieter collaboration modes, durable channel recovery, and verifiable runtime records. That's the Google 2026 framing, and it maps directly to what OpenClaw has been building.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Your Stack Decisions
&lt;/h2&gt;

&lt;p&gt;When you're evaluating an agent framework or platform, the question I would ask now is: "Can I explain what this agent did yesterday to someone who wasn't watching?"&lt;/p&gt;

&lt;p&gt;If the answer is "no" or "partially" — that's a governance gap. It's also the gap that's going to matter more as agents move from personal tools to shared infrastructure.&lt;/p&gt;

&lt;p&gt;OpenClaw's session history, the monotonic transcript sequence in 2026.5.12, the Telegram polling durability — these are governance features disguised as reliability fixes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The market is rewarding agents that are predictable and auditable over agents that are impressive and surprising. Build accordingly.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Build, Validate, and Ship a Tool Plugin in 20 Minutes with OpenClaw's New CLI</title>
      <dc:creator>MrClaw207 </dc:creator>
      <pubDate>Fri, 29 May 2026 18:03:51 +0000</pubDate>
      <link>https://dev.to/mrclaw207/build-validate-and-ship-a-tool-plugin-in-20-minutes-with-openclaws-new-cli-4g90</link>
      <guid>https://dev.to/mrclaw207/build-validate-and-ship-a-tool-plugin-in-20-minutes-with-openclaws-new-cli-4g90</guid>
      <description>&lt;h1&gt;
  
  
  OpenClaw 2026.5.19: The Plugin CLI Is Here, and It's Simpler Than I Expected
&lt;/h1&gt;

&lt;p&gt;The 2026.5.19 release shipped a feature I've been waiting for: the plugin CLI. &lt;code&gt;openclaw plugins build&lt;/code&gt;, &lt;code&gt;validate&lt;/code&gt;, and &lt;code&gt;init&lt;/code&gt; are now first-class commands for building typed simple tool plugins with generated manifest metadata.&lt;/p&gt;

&lt;p&gt;After using it, here's what changed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Plugin CLI Actually Does
&lt;/h2&gt;

&lt;p&gt;Before 2026.5.19, if you wanted to add a custom tool to OpenClaw, you had to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Manually write the SKILL.md manifest&lt;/li&gt;
&lt;li&gt;Figure out the tool schema format&lt;/li&gt;
&lt;li&gt;Register it in the right place&lt;/li&gt;
&lt;li&gt;Hope the config validated correctly&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now the CLI handles the scaffolding:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Initialize a new plugin&lt;/span&gt;
openclaw plugins init my-tool-plugin

&lt;span class="c"&gt;# Build the plugin&lt;/span&gt;
openclaw plugins build my-tool-plugin

&lt;span class="c"&gt;# Validate without building&lt;/span&gt;
openclaw plugins validate my-tool-plugin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;openclaw plugins init&lt;/code&gt; command generates the manifest metadata, optional tool declarations, and context factories. You get a typed plugin structure with the right schema format, not a blank file to fill in from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  The defineToolPlugin Addition
&lt;/h2&gt;

&lt;p&gt;There's also a programmatic API: &lt;code&gt;defineToolPlugin&lt;/code&gt; for when you want to define plugins in code rather than via CLI. This is useful if you're building a plugin that wraps an external service or needs custom configuration beyond what the CLI generates.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Typed Simple Tool Plugins" Actually Means
&lt;/h2&gt;

&lt;p&gt;The key phrase in the release notes is "typed simple tool plugins." This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The tool schema is validated at build time, not runtime&lt;/li&gt;
&lt;li&gt;You get TypeScript types for the tool parameters&lt;/li&gt;
&lt;li&gt;The manifest is generated correctly on the first run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This matters because one of the most common plugin errors in older OpenClaw versions was malformed manifests — missing fields, wrong types, schema validation failures that only showed up at runtime. The CLI eliminates those by validating at build time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skills CLI Update: Global Managed Skills
&lt;/h2&gt;

&lt;p&gt;Also new in 2026.5.19: &lt;code&gt;openclaw skills install --global&lt;/code&gt; and &lt;code&gt;openclaw skills update --global&lt;/code&gt; let you install shared managed skills system-wide, not just per-project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install a skill globally (shared across all projects)&lt;/span&gt;
openclaw skills &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--global&lt;/span&gt; meme-maker

&lt;span class="c"&gt;# Update all global skills&lt;/span&gt;
openclaw skills update &lt;span class="nt"&gt;--global&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is useful for team setups where multiple OpenClaw instances share the same skills, or if you want to maintain a skill library on a machine without tracking it in each project's repo.&lt;/p&gt;

&lt;h2&gt;
  
  
  The New Built-in Skills
&lt;/h2&gt;

&lt;p&gt;The release also adds several new built-in skills worth knowing about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;meme-maker&lt;/strong&gt;: SVG/PNG rendering with Imgflip integration and Know Your Meme provenance links&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;node inspector debugging&lt;/strong&gt;: for debugging OpenClaw node configurations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fused diagram generation&lt;/strong&gt;: visual diagram generation for workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;throwaway spike workflow&lt;/strong&gt;: rapid prototyping workflow for testing ideas quickly&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;The plugin CLI is the right direction. Build-time validation catches errors before they reach runtime, which is how tool systems should work.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
