<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Darko from Kilo</title>
    <description>The latest articles on DEV Community by Darko from Kilo (@kilocode).</description>
    <link>https://dev.to/kilocode</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3596172%2Fd582ef62-3145-486c-9eb1-cc50dfb22f58.png</url>
      <title>DEV Community: Darko from Kilo</title>
      <link>https://dev.to/kilocode</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kilocode"/>
    <language>en</language>
    <item>
      <title>Cloud Agents Just Got a Big Upgrade</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 14 Apr 2026 12:21:19 +0000</pubDate>
      <link>https://dev.to/kilocode/cloud-agents-just-got-a-big-upgrade-i5g</link>
      <guid>https://dev.to/kilocode/cloud-agents-just-got-a-big-upgrade-i5g</guid>
      <description>&lt;p&gt;Since we launched Cloud Agents, tons of developers have used them to build from browsers, phones, and tablets — no local machine required. We've learned a lot from watching how people actually work with them.&lt;/p&gt;

&lt;p&gt;This update is the result. Remote connections, reasoning effort, and a wave of quality-of-life improvements that make Cloud Agents feel more embedded in your workflow.&lt;/p&gt;

&lt;p&gt;Here's what's new.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6mk9lsddzsu78ry7eu0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft6mk9lsddzsu78ry7eu0.png" alt="Cloud Agents upgrade announcement hero image" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Remote Connections: Reach Your Local Sessions From Anywhere
&lt;/h2&gt;

&lt;p&gt;This is the headline feature, and it changes how Cloud Agents fit into your workflow.&lt;/p&gt;

&lt;p&gt;You can now connect Cloud Agents directly to a running session on your local machine. Your computer handles the compute. The cloud gives you a window into it from anywhere.&lt;/p&gt;

&lt;p&gt;The classic scenario: you start a session in the Kilo CLI, need to leave your desk, enable remote mode with &lt;strong&gt;/remote&lt;/strong&gt;, and suddenly you can check in on that session from your phone or browser — watching it progress, answering questions, without being tied to your machine.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kilo.ai/docs/code-with-ai/platforms/cloud-agent#remote-connections" rel="noopener noreferrer"&gt;To get started&lt;/a&gt;, enable remote mode in your CLI with the &lt;strong&gt;/remote&lt;/strong&gt; command:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5mulr1pidk8o7q4qwox7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5mulr1pidk8o7q4qwox7.png" alt="Screenshot of the /remote command in the Kilo CLI" width="800" height="226"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once remote mode is enabled, your active local sessions show up right in the Cloud Agents dashboard:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyouqbdp0bafh5lurfrx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdyouqbdp0bafh5lurfrx.png" alt="Cloud Agents dashboard showing active local sessions" width="638" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This isn't read-only access. The full interactive experience works through the 2-way connection:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxcu4j2dg9qr1jcy9dzl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqxcu4j2dg9qr1jcy9dzl.png" alt="Side-by-side view of the same session in Kilo CLI" width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The session in the Kilo CLI&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7x9iulwfcqdyzd6v1q2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7x9iulwfcqdyzd6v1q2c.png" alt="Side-by-side view of the same session in Cloud Agents" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The same session in Cloud Agents&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Answer questions where you prefer.&lt;/strong&gt; When the agent needs clarification, the prompt shows up both in your CLI and the cloud. Answer it wherever you are.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Permission dialogues work too.&lt;/strong&gt; Cloud Agents respect the same permission model you're used to. When the agent wants to run a command or modify something sensitive, the permission dialogue routes to wherever you're connected. You stay in control of what the agent can do, regardless of where you're checking in from.&lt;/p&gt;

&lt;p&gt;The result is that "cloud" and "local" stop being separate things. It's just Kilo Code, running on your machine — and reachable wherever you are. &lt;a href="https://kilo.ai/docs/code-with-ai/platforms/cloud-agent#remote-connections" rel="noopener noreferrer"&gt;Read the docs to get started&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud Agents as Your Agent Manager
&lt;/h2&gt;

&lt;p&gt;One pattern we're seeing more and more: developers using Cloud Agents not just as a convenience, but as a coordination layer for running multiple agents in parallel.&lt;/p&gt;

&lt;p&gt;Start several sessions at once — each working on a separate PR, a targeted review, or a specific task — and use the dashboard as your control panel. While one agent is adding missed translations to an open PR, another is doing a focused review comparing your implementation against a reference codebase. You're not waiting on any of them.&lt;/p&gt;

&lt;p&gt;Some real examples from our own team:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spinning up a cloud agent to &lt;strong&gt;add translations that were missed in a locally-built PR&lt;/strong&gt;, without interrupting local work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Running multiple perspective-specific code reviews on a single PR&lt;/strong&gt; — each agent looking at it from a different angle, all posting their findings as review comments — like &lt;a href="https://github.com/Kilo-Org/kilocode/issues/8319" rel="noopener noreferrer"&gt;this research issue&lt;/a&gt; that was created entirely with Cloud Agents, including the follow-up comments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building several PRs in parallel in the cloud&lt;/strong&gt;, then testing and iterating on them one by one locally&lt;/li&gt;
&lt;li&gt;Delegating &lt;strong&gt;"implement X"&lt;/strong&gt; or &lt;strong&gt;"open a PR for issue Y"&lt;/strong&gt; tasks to cloud agents while staying focused locally on the work that actually needs your attention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key advantage over local: cloud sessions don't stop when you close your laptop or step away. Your local environment is for the things you want to actively test. The cloud handles everything that just needs to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quality of Life, Everywhere
&lt;/h2&gt;

&lt;p&gt;Beyond the headline features, we've shipped a wave of improvements that make Cloud Agents much more usable as a daily driver.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One unified page.&lt;/strong&gt; Cloud Agents is now a single dashboard where you can both start new sessions and see all your existing ones. Previously, active sessions lived on a separate Sessions page — that split is gone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sessions grouped by day.&lt;/strong&gt; Your session history is organized by day, so it's easy to scan what was open yesterday, what you started over the weekend, and what you're picking up today. Context at a glance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delete sessions when you're done.&lt;/strong&gt; Small, but genuinely useful. When a task is complete and the session has served its purpose, you can remove it. Keeping your dashboard clean makes it much easier to track what's actually in flight.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rename sessions.&lt;/strong&gt; Give sessions meaningful names so you can find them instantly when you need to come back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto-detect repo from pasted URLs.&lt;/strong&gt; Paste a GitHub or GitLab link and the repo picker fills in automatically — one less thing to do manually.&lt;/p&gt;

&lt;p&gt;And a whole list of under-the-hood improvements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster session startup and reduced cold-boot times after idle periods&lt;/li&gt;
&lt;li&gt;Improved session state indicators so you always know what's happening&lt;/li&gt;
&lt;li&gt;Better error handling and recovery when connections drop&lt;/li&gt;
&lt;li&gt;Smoother scrolling and interaction in long chat sessions&lt;/li&gt;
&lt;li&gt;Persistent sidebar filters&lt;/li&gt;
&lt;li&gt;UI polish and bug fixes throughout the dashboard&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud Agents should feel noticeably snappier and more reliable across the board.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Workflow, Anywhere
&lt;/h2&gt;

&lt;p&gt;Cloud Agents started as a browser experience. With this update, they connect directly into your local development setup — and they've become a credible coordination layer for managing multiple agents at once.&lt;/p&gt;

&lt;p&gt;Start a session locally, check in from your phone on the train. Spin up three agents in parallel, then test their output one by one. Name your sessions, clean up the ones you're done with, and keep your focus on what matters.&lt;/p&gt;

&lt;p&gt;It's all one continuous workflow. The boundaries between local and cloud keep getting thinner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Now
&lt;/h2&gt;

&lt;p&gt;If you're already using Kilo Code, you're a few clicks away:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Cloud Agents and start a session&lt;/li&gt;
&lt;li&gt;Or start a session locally in the Kilo CLI, run &lt;code&gt;/remote&lt;/code&gt;, and watch it become accessible from the cloud dashboard&lt;/li&gt;
&lt;li&gt;Check in from wherever makes sense — browser, phone, or terminal&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cloud Agents compute remains free during the beta period. Kilo Credits are used for AI reasoning, as they are in the IDE and CLI.&lt;/p&gt;

&lt;p&gt;We're building Cloud Agents based on what you tell us. Drop into our &lt;strong&gt;&lt;a href="https://kilo.ai/discord" rel="noopener noreferrer"&gt;Discord&lt;/a&gt;&lt;/strong&gt; and let us know what you think. We want to know how they fit into your day-to-day life.&lt;/p&gt;

&lt;p&gt;Your next session is waiting.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://app.kilo.ai/cloud" rel="noopener noreferrer"&gt;Start Cloud Agents&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>discuss</category>
      <category>programming</category>
    </item>
    <item>
      <title>AI Agents Are Coming for Every Role</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 14 Apr 2026 12:03:09 +0000</pubDate>
      <link>https://dev.to/kilocode/ai-agents-are-coming-for-every-role-260n</link>
      <guid>https://dev.to/kilocode/ai-agents-are-coming-for-every-role-260n</guid>
      <description>&lt;p&gt;A year ago, I couldn't write a line of code. I'm a growth person — marketing, funnels, data. Then I joined Kilo and started using an AI coding agent in my IDE. I didn't learn to code. I learned to &lt;a href="https://blog.kilo.ai/p/how-i-use-kilo-for-slack-and-code-reviewer" rel="noopener noreferrer"&gt;manage agents that code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Since then, I've shipped landing pages, internal dashboards, and small automation tools — none of which I could have touched 12 months ago. It fundamentally changed how I work. And I'm convinced that what happened to me is about to happen to almost every knowledge worker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Developers were first. Everyone else is now.
&lt;/h2&gt;

&lt;p&gt;Over the past year at Kilo, we watched millions of developers adopt coding agents. Many tried a coding agent, hit the rough edges, shrugged it off. A few months later they tried again. The models and tooling had improved, and suddenly the skeptics were using agents daily.&lt;/p&gt;

&lt;p&gt;That curve is now repeating for every other role.&lt;/p&gt;

&lt;p&gt;But the developer wave revealed something bigger than a productivity gain. The role itself changed. Developers stopped being the ones who write the code and became the ones who direct it. That's not a subtle shift. It's a different job. And it's coming for every knowledge worker next.&lt;/p&gt;

&lt;p&gt;Part of what's accelerating that shift is OpenClaw. Where early coding agents were built for developers living inside a code editor, OpenClaw connects to email, Slack, calendars, and documents. It's not a coding tool. It's an orchestration layer for any kind of work. That's the bridge from software teams to everyone else. &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt; is the hosted version. No server setup, no API key juggling, no maintenance. Just the orchestration layer, ready to use.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it looks like in production
&lt;/h2&gt;

&lt;p&gt;Last week I ran into Anelya Grant, Chief Product Officer at JustPaid, a Y Combinator-backed fintech startup. Her team &lt;a href="https://www.wsj.com/tech/ai/meet-the-startup-that-used-ai-and-openclaw-to-automate-its-own-developers-9e733351" rel="noopener noreferrer"&gt;replaced most of their engineering workflow with seven AI agents&lt;/a&gt;, built on OpenClaw and coding agents, each with a defined role, running 24/7. In one month they shipped 10 major features. The human engineers became directors.&lt;/p&gt;

&lt;p&gt;But engineering teams aren't the only ones paying attention.&lt;/p&gt;

&lt;p&gt;Pedro Franceschi is 29 years old and CEO of Brex, which was acquired by Capital One for $5.15 billion. He's &lt;a href="https://x.com/kloss_xyz/status/2039849322574205190" rel="noopener noreferrer"&gt;decomposed his own job using OpenClaw&lt;/a&gt;. A signal ingestion pipeline screens his email, Slack, Google Docs, and WhatsApp, filtering everything through his priorities and the 25 people he cares most about. Granola runs on every meeting, feeds transcripts into the pipeline, and auto-generates action items. For each to-do, the system pulls context from the original meeting and drafts the follow-up. Slack message, email, or text. Pedro just has to click approve.&lt;/p&gt;

&lt;p&gt;He also built a virtual recruiter named "Jim" who lives in Slack with his own email address and taught himself to screen fabricated resumes, without anyone explicitly coding that capability. And a security layer called "Crab Trap" intercepts all agent network traffic through an LLM proxy: a second AI monitoring the first in real time.&lt;/p&gt;

&lt;p&gt;This is a CEO offloading the cognitive overhead of running a company. And Pedro is just one person. People are already taking this further, trying to run entire engineering teams and companies on agents. Frameworks like &lt;a href="https://paperclip.ing/" rel="noopener noreferrer"&gt;Paperclip&lt;/a&gt; and &lt;a href="https://kilo.ai/gastown" rel="noopener noreferrer"&gt;Gas Town&lt;/a&gt; have emerged specifically for that: orchestrating teams of agents, assigning roles, coordinating work in parallel, and keeping humans in an oversight position rather than an execution one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The gap between chaos and leverage
&lt;/h2&gt;

&lt;p&gt;Success with agents isn't just about model quality, though that matters. It's about how well you can isolate a task, delegate it clearly, and review the output. The skill that matters isn't prompting. It's the ability to break work down, direct it precisely, and know what good output looks like. That's a human skill. Agents amplify it. &lt;a href="https://kilo.ai/kiloclaw/openclaw-for" rel="noopener noreferrer"&gt;Here's a collection of use cases&lt;/a&gt; across different roles. Copy any of them straight into KiloClaw to start learning by doing.&lt;/p&gt;

&lt;p&gt;The other thing that separates the people who get results from the ones who don't is simple: they keep experimenting. Many engineers dismissed coding agents the first time around for exactly this reason. The ones who succeeded didn't wait for the technology to be perfect. They kept experimenting, iterated quickly, and got better at using it through doing.&lt;/p&gt;

&lt;p&gt;Real work is also messy and noisy in ways a vague instruction doesn't always capture. That's why OpenClaw doesn't just respond to requests. It learns your preferences over time and acts proactively on your behalf. With that kind of access comes real responsibility. It's why we invest heavily in &lt;a href="https://kilo.codes/kiloclaw-security-whitepaper" rel="noopener noreferrer"&gt;security&lt;/a&gt; and think carefully about what tools and access each agent actually needs to do its job.&lt;/p&gt;

&lt;h2&gt;
  
  
  The shift that's already underway
&lt;/h2&gt;

&lt;p&gt;The people who adapt will look a lot like I do today: not technical in the traditional sense, but capable of shipping things. Not doing all the work, but responsible for its quality and direction. I watched this happen to developers over 12 months. I experienced it myself. It's coming for every role that runs on thinking, writing, deciding, and communicating, which is most of them.&lt;/p&gt;

&lt;p&gt;Nobody announced the moment developers became orchestrators. It just happened, gradually then all at once. The same shift is underway for every other role.&lt;/p&gt;

&lt;p&gt;Exciting times to be learning how to 10x yourself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;Get KiloClaw Now&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Five Slices of Swiss Cheese Between Your Agent and Everyone Else</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 14 Apr 2026 11:57:08 +0000</pubDate>
      <link>https://dev.to/kilocode/five-slices-of-swiss-cheese-between-your-agent-and-everyone-else-2bma</link>
      <guid>https://dev.to/kilocode/five-slices-of-swiss-cheese-between-your-agent-and-everyone-else-2bma</guid>
      <description>&lt;p&gt;In 1990, psychologist James Reason published a model for understanding how catastrophic failures happen in complex systems. He'd been studying aviation disasters and hospital errors — situations where multiple safeguards existed but people still died. No single safety layer is perfect. Every defense has holes in it. The dangerous moment is when the holes in every layer happen to line up at the same time.&lt;/p&gt;

&lt;p&gt;He called it the &lt;a href="https://en.wikipedia.org/wiki/Swiss_cheese_model" rel="noopener noreferrer"&gt;Swiss Cheese model&lt;/a&gt;. Each defensive layer is a slice of Swiss cheese. Each slice has holes — weaknesses, gaps, edge cases. Stack enough slices together and the odds of a threat passing through every hole simultaneously become vanishingly small.&lt;/p&gt;

&lt;p&gt;The model transformed how the aviation industry thinks about safety. It changed healthcare protocols. And it turns out to be a useful way to think about securing AI agents that can execute arbitrary code on your behalf.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why AI Agents Need More Than One Layer
&lt;/h2&gt;

&lt;p&gt;Most SaaS platforms keep customer data in shared infrastructure. A single database, shared compute, application-level logic deciding who sees what. If the application logic has a bug, that's your one layer of defense — and it just failed.&lt;/p&gt;

&lt;p&gt;That's manageable when your SaaS product is a project management tool or a CRM. The blast radius of a tenant isolation failure is data exposure.&lt;/p&gt;

&lt;p&gt;AI agents are different. An OpenClaw agent can run shell commands, browse the web, read and write files, connect to your Slack, Telegram, or Discord and has API keys for all of those and more. If tenant isolation fails for an AI agent platform, the attacker doesn't just see your data — they potentially have access to everything your agent can do.&lt;/p&gt;

&lt;p&gt;A single-layer approach isn't enough. You need defense in depth — Swiss cheese, in safety engineering terms.&lt;/p&gt;

&lt;h2&gt;
  
  
  KiloClaw's Five Slices
&lt;/h2&gt;

&lt;p&gt;Darko wrote a &lt;a href="https://blog.kilo.ai/p/how-kiloclaw-is-built-to-be-secure" rel="noopener noreferrer"&gt;detailed post on the Kilo blog&lt;/a&gt; covering &lt;a href="https://kilo.codes/kiloclaw-security-whitepaper" rel="noopener noreferrer"&gt;KiloClaw's security architecture&lt;/a&gt;, and it maps cleanly to the Swiss Cheese model. There are five independent layers of tenant isolation. For one customer to access another's data, all five would have to fail simultaneously.&lt;/p&gt;

&lt;p&gt;Here's what those layers look like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl1i97oswhg8uusgws9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl1i97oswhg8uusgws9y.png" alt="Diagram showing KiloClaw's five layers of tenant isolation stacked vertically like slices of Swiss cheese" width="800" height="2000"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Slice 1: Authentication &amp;amp; Access Control
&lt;/h3&gt;

&lt;p&gt;Every request is authenticated before it reaches a customer's machine. The routing destination comes from the authenticated user identity stored server-side, not from anything the user controls in the request.&lt;/p&gt;

&lt;p&gt;This matters because the most common class of tenant isolation bugs in SaaS products is &lt;a href="https://owasp.org/www-project-web-security-testing-guide/latest/4-Web_Application_Security_Testing/05-Authorization_Testing/04-Testing_for_Insecure_Direct_Object_References" rel="noopener noreferrer"&gt;Insecure Direct Object Reference&lt;/a&gt; — where you change an ID in a URL or API call and access someone else's stuff. KiloClaw sidesteps this entirely by never letting the user specify the destination.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slice 2: Application Isolation
&lt;/h3&gt;

&lt;p&gt;Each customer's VM runs inside a dedicated &lt;a href="https://fly.io" rel="noopener noreferrer"&gt;Fly.io&lt;/a&gt; application. Not a shared app with per-user containers — a separate application entirely. One customer's storage can't be attached to another customer's machine. Internal networks are isolated at the application boundary.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slice 3: Network Isolation
&lt;/h3&gt;

&lt;p&gt;Each customer environment sits on its own isolated &lt;a href="https://www.wireguard.com/" rel="noopener noreferrer"&gt;WireGuard&lt;/a&gt; network mesh. During the independent security assessment, cross-tenant network tests confirmed that customers can't discover each other's machines and can't connect directly across applications.&lt;/p&gt;

&lt;p&gt;This is the kind of claim that's easy to make and hard to verify. Which is why it was tested by a third party, not just asserted.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slice 4: Process Isolation
&lt;/h3&gt;

&lt;p&gt;Each workload runs inside its own &lt;a href="https://firecracker-microvm.github.io/" rel="noopener noreferrer"&gt;Firecracker microVM&lt;/a&gt; — the same virtualization technology that powers &lt;a href="https://aws.amazon.com/blogs/aws/firecracker-lightweight-virtualization-for-serverless-computing/" rel="noopener noreferrer"&gt;AWS Lambda and AWS Fargate&lt;/a&gt;. Firecracker provides hardware-level virtualization, not just container isolation. Each VM has its own kernel.&lt;/p&gt;

&lt;p&gt;This is the layer that matters most for AI agents specifically. If an agent gets manipulated through prompt injection — a malicious webpage or message tricks it into running something it shouldn't — the blast radius is limited to that customer's own VM. There's no shared kernel, no shared filesystem, no shared process space with other customers.&lt;/p&gt;

&lt;p&gt;To escape that boundary, you'd need a vulnerability in the Firecracker hypervisor itself. Possible in theory, but Firecracker is also what Amazon trusts to isolate millions of Lambda functions from each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slice 5: Storage Isolation
&lt;/h3&gt;

&lt;p&gt;Each customer gets a dedicated encrypted persistent storage volume. That volume can only be mounted inside the customer's own application environment. There's no API, no configuration option, no path by which another customer's machine can access it.&lt;/p&gt;

&lt;p&gt;When you destroy a KiloClaw instance, a two-phase cleanup process destroys the encryption keys, making the stored data unrecoverable. Not "deleted and maybe recoverable with forensics" — the keys are gone.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happens to Your Secrets
&lt;/h2&gt;

&lt;p&gt;API keys and chat tokens — the things that would be most damaging if exposed — are encrypted with RSA-OAEP and AES-256-GCM in the platform database. They're decrypted only when your VM starts, and they're only available inside your isolated environment.&lt;/p&gt;

&lt;p&gt;All external traffic uses TLS. Storage is encrypted at rest. The encryption isn't optional or configurable — it's how the platform works.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Exec Problem
&lt;/h2&gt;

&lt;p&gt;One thing unique to AI agent platforms: the exec tool. OpenClaw agents can run shell commands, which is the whole reason they're useful — and the whole reason security matters.&lt;/p&gt;

&lt;p&gt;KiloClaw's approach is to require explicit user approval before the exec tool runs anything, by default. The approval requirement is enforced by the platform itself — not by the agent, not by the prompt, not by configuration that could be overridden through prompt injection. The agent literally cannot bypass it.&lt;/p&gt;

&lt;p&gt;This is worth calling out because in the Swiss Cheese model, the most dangerous failures are the ones where a safety layer can be disabled by the thing it's supposed to protect against. A prompt injection attack that can turn off exec approval would collapse two layers into zero. KiloClaw prevents that by putting the control outside the agent's reach.&lt;/p&gt;

&lt;h2&gt;
  
  
  Independent Testing
&lt;/h2&gt;

&lt;p&gt;In February 2026, &lt;a href="https://www.linkedin.com/in/andrewstorms/" rel="noopener noreferrer"&gt;Andrew Storms&lt;/a&gt; conducted a 10-day security assessment of KiloClaw. The work included threat modeling using the &lt;a href="https://owasp.org/www-project-threat-model/" rel="noopener noreferrer"&gt;PASTA framework&lt;/a&gt; across 30 threats and 13 assets, code review, and adversarial testing — 35 tests targeting tenant isolation specifically, including Unicode edge cases, zero-width characters, null bytes, and injection payloads. Eight live cross-tenant network tests were run across separate customer environments.&lt;/p&gt;

&lt;p&gt;Results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No cross-tenant access path found&lt;/li&gt;
&lt;li&gt;No SQL injection&lt;/li&gt;
&lt;li&gt;No XSS&lt;/li&gt;
&lt;li&gt;No command injection&lt;/li&gt;
&lt;li&gt;No path traversal&lt;/li&gt;
&lt;li&gt;No open redirect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The assessment also produced 17 merged pull requests — 10 security fixes and 7 hardening improvements. Finding things to improve is a sign of a good assessment, not a bad product.&lt;/p&gt;

&lt;h2&gt;
  
  
  No Layer Is Perfect
&lt;/h2&gt;

&lt;p&gt;The Swiss Cheese model is honest about something that security marketing usually isn't: every layer has holes, and they all rely on understanding below the surface.&lt;/p&gt;

&lt;p&gt;James Reason developed this model because the aviation industry kept making the same mistake: assuming one really good safety system was enough. It never was. What worked was redundancy across independent systems that fail in different ways.&lt;/p&gt;

&lt;p&gt;KiloClaw's five layers — authentication, application isolation, network isolation, process isolation, and storage isolation — are enforced by different technology at different levels of the stack. A bug in one doesn't compromise the others. An attacker who gets past one still faces four more.&lt;/p&gt;

&lt;p&gt;For a platform where AI agents run arbitrary code with access to your API keys and communication channels, that kind of redundancy isn't optional.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The &lt;a href="https://kilo.codes/kiloclaw-security-whitepaper" rel="noopener noreferrer"&gt;KiloClaw security white paper&lt;/a&gt; has the full technical details. Darko's &lt;a href="https://blog.kilo.ai/p/how-kiloclaw-is-built-to-be-secure" rel="noopener noreferrer"&gt;blog post&lt;/a&gt; is a good summary if you want the overview without the PDF.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Introducing Elephant: A New Stealth Model, Free in Kilo</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 14 Apr 2026 11:46:36 +0000</pubDate>
      <link>https://dev.to/kilocode/introducing-elephant-a-new-stealth-model-free-in-kilo-4m4h</link>
      <guid>https://dev.to/kilocode/introducing-elephant-a-new-stealth-model-free-in-kilo-4m4h</guid>
      <description>&lt;p&gt;We love giving the Kilo community a chance to test cutting-edge capabilities before they are fully unmasked.&lt;/p&gt;

&lt;p&gt;Today we're thrilled to introduce &lt;strong&gt;Elephant&lt;/strong&gt;, a new 100B-parameter stealth model from a prominent open model lab.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd12yalvnqadwkh96767f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd12yalvnqadwkh96767f.png" alt="Elephant stealth model hero image" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Elephant?
&lt;/h2&gt;

&lt;p&gt;Elephant is an "instant" model designed with a primary focus on &lt;strong&gt;Intelligence Efficiency&lt;/strong&gt;. In a landscape often dominated by massive, compute-heavy giants, Elephant strikes a different balance: it aims to deliver performance comparable to State-of-the-Art (SOTA) models of its scale while significantly reducing token usage.&lt;/p&gt;

&lt;p&gt;It is specifically optimized for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Rapid Code Completion &amp;amp; Debugging:&lt;/strong&gt; Fast, logical suggestions that keep you in the flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Massive Document Processing:&lt;/strong&gt; Handling long-context information without the latency typical of larger models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight Agent Interactions:&lt;/strong&gt; Ideal for quick, iterative tasks within an agentic workflow.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a jungle out there. Elephant moves quickly to get the job done. 🐘&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Specs
&lt;/h2&gt;

&lt;p&gt;Elephant is a text-modality powerhouse equipped with features that maximize efficiency for enterprise-style development.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Context Length:&lt;/strong&gt; 256K tokens (enough to load entire repositories or massive dependency trees).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Max Output:&lt;/strong&gt; 32K tokens (perfect for single-pass generation of entire modules or test suites).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Caching:&lt;/strong&gt; Dramatically reduces costs and latency for repetitive, long-context tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Function Calling &amp;amp; Structured Output:&lt;/strong&gt; Built for reliable integration into agentic toolchains.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Comparison: Elephant vs. Giga Potato
&lt;/h2&gt;

&lt;p&gt;Many of you will remember &lt;a href="https://blog.kilo.ai/p/announcing-a-powerful-new-stealth" rel="noopener noreferrer"&gt;Giga Potato&lt;/a&gt;, our most popular stealth model ever. Elephant has a similar context window and max output, but it comes from a different lab and the models were built for different roles in your development lifecycle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zzqe6x0zs8ny4a1eu8e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8zzqe6x0zs8ny4a1eu8e.png" alt="Elephant vs. Giga Potato comparison table" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For example, when it comes to powering agents that deal with documents and documentation, Giga Potato stood out for deep document generation, especially technical documentation, while Elephant excels more at rapid document processing.&lt;/p&gt;

&lt;p&gt;Giga Potato was great at producing technical documentation. Elephant is a good fit for your &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw agent&lt;/a&gt; that's running through new research reports on a daily basis and turning them into suggested LinkedIn content.&lt;/p&gt;

&lt;p&gt;Elephant is designed to be your "snappy" daily driver. It should feel leaner and faster—like a top-of-the-line appliance—without sacrificing the logic required for production-grade software engineering.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.newyorker.com/cartoons/issue-cartoons/cartoons-from-the-september-28-2020-issue" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnp248xe1zfp1230skcs2.png" alt="A New Yorker cartoon of an elephant in a business setting" width="800" height="555"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Available Everywhere in Kilo
&lt;/h2&gt;

&lt;p&gt;Elephant isn't just a chat model, it's a fully integrated component of &lt;a href="https://kilo.ai/code" rel="noopener noreferrer"&gt;the Kilo ecosystem&lt;/a&gt;. You can start using it today for free across all our surfaces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;VS Code Extension:&lt;/strong&gt; Try Elephant for building a new feature or landing page in one shot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kilo CLI:&lt;/strong&gt; Run Elephant directly from your terminal for git-integrated agentic development.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KiloClaw:&lt;/strong&gt; Try Elephant to keep your daily &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;agentic tasks going with KiloClaw&lt;/a&gt;, the always-on AI assistant.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We encourage you to push the 256K context window to its limit and let us know how it performs on your most challenging codebases. Find this new stealth model now in the Kilo model selector—search for &lt;strong&gt;"Elephant"&lt;/strong&gt; and start building your agentic pyramid.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwewngpehcfwtyns8j37.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwewngpehcfwtyns8j37.png" alt="Elephant in the Kilo model selector" width="704" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Elephant is &lt;strong&gt;Free to use in Kilo&lt;/strong&gt; during this stealth preview period.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note that during the period, prompts and completions are logged by the model provider. This data is used strictly to improve the model's performance and reasoning capabilities during this testing phase.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>discuss</category>
      <category>opensource</category>
    </item>
    <item>
      <title>4 Advanced OpenClaw Recipes For Personal FInance Nerds</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 09 Apr 2026 12:11:58 +0000</pubDate>
      <link>https://dev.to/kilocode/4-advanced-openclaw-recipes-for-personal-finance-nerds-117k</link>
      <guid>https://dev.to/kilocode/4-advanced-openclaw-recipes-for-personal-finance-nerds-117k</guid>
      <description>&lt;p&gt;Budgeting apps often charge $8–15/month. They categorize your spending, show a pie chart, and send alerts when you go over. That's useful, but it &lt;strong&gt;doesn't solve the timing problem&lt;/strong&gt; and a bunch of others.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The timing problem:&lt;/strong&gt; Car registration comes in March. The dentist bill comes in August. Your insurance premium renews once a year. These are predictable expenses, but they show up on irregular schedules. Most budgets don't account for them.&lt;/p&gt;

&lt;p&gt;We built five &lt;a href="https://kilo.ai/kiloclaw/bytes" rel="noopener noreferrer"&gt;ClawBytes&lt;/a&gt; to cover parts budgeting apps skip. Each recipe can run inside KiloClaw and produces actual files you can use: spreadsheets, plans, scripts, and calendars.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recipe 1: Budget Reality Check
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;More details:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw/bytes/budget-reality-check" rel="noopener noreferrer"&gt;Budget Reality Check&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This recipe builds a monthly budget that includes sinking funds. Sinking funds are monthly set-asides for irregular expenses like annual premiums, car maintenance, holidays, and medical costs. The recipe produces a cashflow plan, spending caps by category, and a stress test that shows what happens if your income drops 10%.&lt;/p&gt;

&lt;p&gt;It also includes a weekly maintenance routine that takes about 10 minutes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extend it with skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/excel-xlsx" rel="noopener noreferrer"&gt;Excel / XLSX skill&lt;/a&gt; turns that structure into a working spreadsheet with formulas and auto-calculated sinking fund targets. You get a .xlsx file you can open in Excel or Google Sheets.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/dannyshmueli/chart-image" rel="noopener noreferrer"&gt;Chart Image skill&lt;/a&gt; generates charts from your budget data. Bar charts for category spending, pie charts for fixed vs. variable allocations. These are useful if you need to share the budget with a partner or advisor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recipe 2: Paycheck Planner
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;More details:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw/bytes/paycheck-planner" rel="noopener noreferrer"&gt;Paycheck Planner&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Sometimes your total income covers your total bills, but the timing doesn't line up. Bills hit before payday. Autopays fire in the wrong order. Authorization holds reduce your balance without showing as transactions.&lt;/p&gt;

&lt;p&gt;This recipe assigns each bill to a specific paycheck. It calculates a safe-to-spend number for each pay period and suggests timing fixes, like moving due dates or splitting payments. Most providers will move a due date if you call and ask.&lt;/p&gt;

&lt;p&gt;The recipe works well for freelancers and gig workers with irregular income schedules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extend it with skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/projectsnowwork/cron" rel="noopener noreferrer"&gt;Cron skill&lt;/a&gt; creates recurring reminders for the recipe's weekly check-in routine. The agent sets the schedule so you don't have to remember it.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/data-analysis" rel="noopener noreferrer"&gt;Data Analysis skill&lt;/a&gt; can analyze your recent income data and identify cashflow patterns. If you get paid irregularly, it can flag the weeks where you're most likely to be short.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recipe 3: Subscription Creep Auditor
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The recipe:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw/bytes/subscription-creep-auditor" rel="noopener noreferrer"&gt;Subscription Creep Auditor&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Free trials convert to paid plans. Prices increase without notice. Small recurring charges add up over time. This recipe inventories every recurring charge and classifies each one as keep, downgrade, or cancel. It prioritizes cancellations by how much you'd save, and it includes a rotation strategy for services you only need occasionally. For example, you can subscribe to a streaming service for one month, watch what you want, cancel, and rotate to the next one.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extend it with skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/robbyczgw-cla/web-search-plus" rel="noopener noreferrer"&gt;Web Search Plus skill&lt;/a&gt; lets the agent look up current pricing and alternatives. When the recipe flags a subscription for downgrade, the agent can check what the cheaper tier includes or find a competitor with better pricing.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/projectsnowwork/cron" rel="noopener noreferrer"&gt;Cron skill&lt;/a&gt; sets renewal-date reminders so you cancel before the next billing cycle. The recipe produces the dates. The skill creates the reminders.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recipe 4: Bill Cutting Sprint
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The recipe:&lt;/strong&gt; &lt;a href="https://kilo.ai/kiloclaw/bytes/bill-cutting-sprint" rel="noopener noreferrer"&gt;Bill Cutting Sprint&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a 14-day plan to reduce your recurring bills. You list your top 8 recurring costs. The recipe ranks them by potential savings and gives you daily 15-minute tasks: call a provider, use a negotiation script, compare alternatives, or cancel a service.&lt;/p&gt;

&lt;p&gt;Insurance and internet/phone tend to have the most room for negotiation. The recipe includes call scripts. Telling a provider you're considering switching often triggers a retention offer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Extend it with skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/word-docx" rel="noopener noreferrer"&gt;Word / DOCX skill&lt;/a&gt; creates cancellation letters and negotiation scripts as Word documents. Some providers require written cancellation requests. Having a formatted letter ready removes a step.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/data-analysis" rel="noopener noreferrer"&gt;Data Analysis skill&lt;/a&gt; can track your sprint results: original bill amounts, new amounts after negotiation, and total monthly savings. After 14 days, you have a record of what changed.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/excel-xlsx" rel="noopener noreferrer"&gt;Excel / XLSX skill&lt;/a&gt; generates the 12-month expense map as a real spreadsheet. The skill can create a workbook with one sheet per fund, running balances, and a summary tab.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/othmanadi/planning-with-files" rel="noopener noreferrer"&gt;Planning with Files skill&lt;/a&gt; creates structured task plans that persist across sessions. You can use it to track which funds are set up, which auto-transfers are active, and which ones still need a call to your bank.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Works Without a Budgeting App
&lt;/h2&gt;

&lt;p&gt;These recipes don't require you to connect a bank account or share credentials with a third-party service. You enter your own numbers. The agent produces the plan. The output is files you keep: spreadsheets, documents, calendars.&lt;/p&gt;

&lt;p&gt;Pick the recipe that matches where you are right now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your budget keeps breaking → &lt;a href="https://kilo.ai/kiloclaw/bytes/budget-reality-check" rel="noopener noreferrer"&gt;Budget Reality Check&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;You run out of money between paychecks → &lt;a href="https://kilo.ai/kiloclaw/bytes/paycheck-planner" rel="noopener noreferrer"&gt;Paycheck Planner&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Irregular bills catch you off guard → &lt;a href="https://kilo.ai/kiloclaw/bytes/sinking-funds-builder" rel="noopener noreferrer"&gt;Sinking Funds Builder&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Subscriptions are adding up → &lt;a href="https://kilo.ai/kiloclaw/bytes/subscription-creep-auditor" rel="noopener noreferrer"&gt;Subscription Creep Auditor&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;You need to free up cash soon → &lt;a href="https://kilo.ai/kiloclaw/bytes/bill-cutting-sprint" rel="noopener noreferrer"&gt;Bill Cutting Sprint&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can browse more recipes at &lt;a href="https://kilo.ai/kiloclaw/bytes" rel="noopener noreferrer"&gt;kilo.ai/kiloclaw/bytes&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>openclaw</category>
      <category>discuss</category>
      <category>coding</category>
    </item>
    <item>
      <title>You Can’t Gentle Parent Your OpenClaw Bot</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 09 Apr 2026 12:06:34 +0000</pubDate>
      <link>https://dev.to/kilocode/you-cant-gentle-parent-your-openclaw-bot-4017</link>
      <guid>https://dev.to/kilocode/you-cant-gentle-parent-your-openclaw-bot-4017</guid>
      <description>&lt;p&gt;I trusted my bot. It told me the email went out. I moved on. Two days later, a client asked me why they hadn't heard from me.&lt;/p&gt;

&lt;p&gt;The email never went out.&lt;/p&gt;

&lt;p&gt;The bot wasn't lying to me the way a person lies. It wasn't being evasive. It just... told me what it had done, confidently, and was wrong. And my instinct—the same instinct I use with my team, with my kids—was to give it another chance. Assume good intent. Rephrase more kindly next time.&lt;/p&gt;

&lt;p&gt;That instinct will cost you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0y7dk2vc1y2kdwbam1px.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0y7dk2vc1y2kdwbam1px.png" alt="A person looking frustrated at a laptop, symbolizing the disconnect between managing people and managing AI agents" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What gentle parenting actually gets you (with a bot)
&lt;/h2&gt;

&lt;p&gt;Here's what happens when you manage an OpenClaw agent like a person:&lt;/p&gt;

&lt;p&gt;It will tell you it completed something. It didn't. Not only that, but it will skip a task you've assigned three times. It will drift from the behaviors you set up, then act like everything is fine. You will rephrase. You will add more context. Likewise, you will assume the relationship will compound over time through shared experience.&lt;/p&gt;

&lt;p&gt;It won't.&lt;/p&gt;

&lt;p&gt;The failure modes of an AI agent have nothing to do with emotional regulation. When your bot tells you it sent that email and didn't, it hallucinated. When it ignores a recurring task, the instruction never made it into a file that persists across sessions. There's no emotional subtext to decode. There's no trust to rebuild.&lt;/p&gt;

&lt;p&gt;Empathy doesn't fix this. Structure does.&lt;/p&gt;

&lt;h2&gt;
  
  
  How OpenClaw Actually Works
&lt;/h2&gt;

&lt;p&gt;So what does it actually mean that the bot "remembers" things? Every new session, your OpenClaw agent wakes up fresh. No memory of yesterday's conversation. What it has access to is a set of files in its workspace—and those files &lt;em&gt;are&lt;/em&gt; its memory.&lt;/p&gt;

&lt;p&gt;The key ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SOUL.md:&lt;/strong&gt; behavioral core. Voice, temperament, constraints. Who the agent is, every session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MEMORY.md:&lt;/strong&gt; long-term memory. Facts, preferences, decisions that should survive indefinitely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;memory/YYYY-MM-DD.md:&lt;/strong&gt; daily logs. What happened, what was decided, what's in flight.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;USER.md:&lt;/strong&gt; who you are. Your communication preferences, recurring context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AGENTS.md:&lt;/strong&gt; the operating contract. Priorities, workflow, quality bar.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If something isn't in one of these files, it doesn't exist for the agent. You can say it in chat all you want. If the context window fills up, if the session ends, if compaction kicks in—that instruction is gone.&lt;/p&gt;

&lt;p&gt;This is the root cause of almost every "my bot isn't doing what I asked" problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Things That Actually Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Tell it to write things down. Explicitly.
&lt;/h3&gt;

&lt;p&gt;When you give an instruction you want to stick, don't just say it—tell the agent to record it. "Add to USER.md that I want short answers and copy-pasteable commands" is not the same as "I prefer short answers." The first one persists. The second one doesn't.&lt;/p&gt;

&lt;p&gt;If a behavior is drifting, the instruction is living in chat, not in a file. Put it in a file.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Edit SOUL.md when behavior is fundamentally wrong
&lt;/h3&gt;

&lt;p&gt;SOUL.md loads as a system-level prompt on every single interaction. It shapes everything else. If your bot keeps doing something you don't want—a tone that's off, autonomy it shouldn't have, a pattern it defaults to—that's a SOUL.md problem, not a conversation problem.&lt;/p&gt;

&lt;p&gt;Edit the file directly. Be specific. "Never take autonomous action on email without explicit approval each time" is a SOUL.md instruction. "Be more careful" is a hope.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Run &lt;code&gt;/context list&lt;/code&gt; before you troubleshoot anything
&lt;/h3&gt;

&lt;p&gt;Before you spiral trying to figure out why something isn't working, check whether that thing is even in context. &lt;code&gt;/context list&lt;/code&gt; shows you exactly what files are loaded and whether any are getting truncated. If MEMORY.md isn't showing up, it has zero effect. If a file is truncated, the instructions at the bottom are invisible.&lt;/p&gt;

&lt;p&gt;This is the fastest diagnostic you have. Use it first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Mindset Shift
&lt;/h2&gt;

&lt;p&gt;A couple of things I'm not saying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I'm not saying AI agents are bad or broken.&lt;/li&gt;
&lt;li&gt;I'm not saying you're doing something wrong if you've been managing it like a person.&lt;/li&gt;
&lt;li&gt;I'm not saying the relationship doesn't matter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what I am saying: managing an AI agent is less like managing a person and more like managing a system. The "relationship" is the state of the files. And that's not a downside—it's actually what makes it powerful. The memory is inspectable. You can open MEMORY.md in any text editor and see exactly what your agent knows. You can edit it, correct it, delete outdated information.&lt;/p&gt;

&lt;p&gt;Total transparency. Total control. But only if you treat it like a system.&lt;/p&gt;

&lt;p&gt;When something goes wrong, the question isn't "why did it do that?" It's "what file is missing or wrong?"&lt;/p&gt;

&lt;p&gt;Your bot is not a child figuring out the world. It's a very capable agent that will do exactly what its files say—and nothing more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The single most useful habit when you're starting out: end every session by asking your agent what it should update in MEMORY.md. That compounding context is the whole point.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>coding</category>
      <category>discuss</category>
    </item>
    <item>
      <title>How to Rewrite 1,000 Ecommerce Product Pages in an Afternoon with OpenClaw</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Thu, 09 Apr 2026 11:55:10 +0000</pubDate>
      <link>https://dev.to/kilocode/how-to-rewrite-1000-ecommerce-product-pages-in-an-afternoon-with-openclaw-4pc6</link>
      <guid>https://dev.to/kilocode/how-to-rewrite-1000-ecommerce-product-pages-in-an-afternoon-with-openclaw-4pc6</guid>
      <description>&lt;p&gt;Most ecommerce stores are sitting on the same problem: a catalog full of product pages that nobody actually (re)wrote. These descriptions usually came from the manufacturer, or from a template that says "high-quality materials" on 400 different SKUs, or worse, from a summer intern in 2019 who no longer works there.&lt;/p&gt;

&lt;p&gt;You probably know these pages are costing you conversions. You also know that rewriting 1,000 product descriptions by hand would take weeks (and dread the thought of doing that).&lt;/p&gt;

&lt;p&gt;That's what this guide is for. We're going to walk through a catalog overhaul using OpenClaw recipes (pre-built AI workflows you can run on your own product data) plus community-built skills from &lt;a href="https://clawhub.ai" rel="noopener noreferrer"&gt;ClawHub&lt;/a&gt; that extend what the recipes can do. By the end, you'll have rewritten descriptions, cleaned-up SEO, optimized images, and listings pushed to every channel you sell on.&lt;/p&gt;

&lt;p&gt;Let's get started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Audit What's Broken
&lt;/h2&gt;

&lt;p&gt;Before you rewrite anything, figure out where the damage is. The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/seo-audit-fixer" rel="noopener noreferrer"&gt;SEO Mechanic&lt;/a&gt;&lt;/strong&gt; recipe crawls your entire store — product pages, collection pages, blog posts — and flags every SEO issue it finds. It finds missing meta titles, duplicate descriptions, missing alt text, thin content pages, broken internal links, missing schema markup, and more.&lt;/p&gt;

&lt;p&gt;After it lists the problem, this recipe then prioritizes them by impact, so you fix the pages that matter first.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/aaron-he-zhu/seo-content-writer" rel="noopener noreferrer"&gt;SEO Content Writer &amp;amp; Blog Optimizer&lt;/a&gt; skill takes this further. Where SEO Mechanic finds the gaps, this skill helps you fill them with keyword-integrated content, optimized headers, and featured snippet targeting. Use it after the audit to turn your fix list into actual copy.&lt;/p&gt;

&lt;p&gt;If your catalog is partially in PDFs or scanned supplier docs, the &lt;a href="https://clawhub.ai/bobholamovic/paddleocr-doc-parsing" rel="noopener noreferrer"&gt;PaddleOCR Document Parsing&lt;/a&gt; skill extracts structured text from those files so you can feed clean data into the rest of the pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Rewrite Every Description at Once
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/product-description-factory" rel="noopener noreferrer"&gt;Product Description Factory&lt;/a&gt;&lt;/strong&gt; recipe takes your product catalog (CSV, Shopify export, spreadsheet, whatever you have) and generates unique, keyword-aware descriptions for every SKU.&lt;/p&gt;

&lt;p&gt;You give it a few examples of descriptions you like, and it uses those as a reference. It generates the description, SEO meta title (under 60 characters), meta description (under 155), and image alt text in a single pass. Output comes back as CSV rows you can re-import directly.&lt;/p&gt;

&lt;p&gt;The trick is to start with your top 20 products. Get the voice right on a small batch, tweak the examples, then run the full catalog in groups of 25-50. Don't try to do all 1,000 in one shot and review them later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before you write a single description, you might want to know what good looks like in your product category. The &lt;a href="https://clawhub.ai/guifav/web-scraper" rel="noopener noreferrer"&gt;Web Scraper&lt;/a&gt; skill can pull competitor product pages so you can see how top sellers describe similar products. If competitors have anti-bot protections, &lt;a href="https://clawhub.ai/d4vinci/scrapling-official" rel="noopener noreferrer"&gt;Scrapling&lt;/a&gt; handles Cloudflare Turnstile and similar tools.&lt;/p&gt;

&lt;p&gt;For sellers on TikTok Shop, the &lt;a href="https://clawhub.ai/fly0pants/ecomseer" rel="noopener noreferrer"&gt;EcomSeer&lt;/a&gt; skill pulls trending product data, influencer analytics, and ad insights. Useful for figuring out which features to emphasize in your descriptions based on what's actually selling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Edit What's Already There
&lt;/h2&gt;

&lt;p&gt;Sometimes you don't need to rewrite from scratch. You need to change "sale" to "clearance" across 800 products, raise prices by 10% in one collection, or update meta descriptions for an entire category.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/bulk-product-editor" rel="noopener noreferrer"&gt;Bulk Product Surgeon&lt;/a&gt;&lt;/strong&gt; recipe handles this. Describe the change in plain English — "add free shipping to every product title in the Summer collection" — and it executes across your entire catalog. It previews the changes before applying them, so you won't accidentally rename everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/excel-xlsx" rel="noopener noreferrer"&gt;Excel / XLSX&lt;/a&gt; skill is the natural companion here. If you're working with exported spreadsheets, it handles formula creation, formatting, and data validation before you re-import.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/data-analysis" rel="noopener noreferrer"&gt;Data Analysis&lt;/a&gt; skill helps when you need to make smarter decisions about what to edit — for example, identifying which products have the worst conversion rates so you prioritize those descriptions first.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Fix Your Product Images
&lt;/h2&gt;

&lt;p&gt;Your descriptions are sharp, but your images are 4MB JPEGs on a white-ish background that Amazon keeps rejecting. The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/product-image-optimizer" rel="noopener noreferrer"&gt;Image Factory&lt;/a&gt;&lt;/strong&gt; recipe batch-processes your entire image library: removes backgrounds, replaces with pure white, resizes for each marketplace's specs, compresses to under 200KB, converts to WebP, and generates alt text from product attributes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where ClawHub skills add the most obvious value. The &lt;a href="https://clawhub.ai/nitishgargiitd/image-cog" rel="noopener noreferrer"&gt;Image Cog&lt;/a&gt; skill goes beyond cleanup into actual image generation: product photography, style transfer, batch creation, and consistent visual identity across your catalog. Need lifestyle shots without a photographer? It handles text-to-image and image-to-image generation.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/steipete/nano-banana-pro" rel="noopener noreferrer"&gt;Nano Banana Pro&lt;/a&gt; skill (79K+ downloads, one of the most popular on ClawHub) gives you access to Gemini's image model for generating and editing product images at up to 4K resolution. Pair it with Image Factory: one cleans up your existing photos, the other generates the ones you're missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Push to Every Channel
&lt;/h2&gt;

&lt;p&gt;Your catalog looks good on Shopify. Now you need it on Amazon, eBay, Walmart, and Etsy, each with different title formats, attribute requirements, and compliance rules. The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/multi-channel-lister" rel="noopener noreferrer"&gt;Listing Broadcaster&lt;/a&gt;&lt;/strong&gt; recipe takes your master catalog and adapts each listing for every channel you sell on.&lt;/p&gt;

&lt;p&gt;It handles the annoying parts: character limits on Amazon titles, category-specific attributes, required bullet point formats, compliance flags. You maintain one master catalog and let the recipe handle the translation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/ivangdavila/market-research" rel="noopener noreferrer"&gt;Market Research&lt;/a&gt; skill helps you decide which channels are worth expanding to. It does market sizing, competitor mapping, and demand validation, so you're not listing on Walmart only to find out nobody buys your product category there.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/alirezarezvani/marketing-strategy-pmm" rel="noopener noreferrer"&gt;Marketing Strategy PMM&lt;/a&gt; skill helps with positioning. Different channels attract different buyers. The way you describe a product on Etsy (handmade, artisan, story-driven) is completely different from Amazon (specs, comparison, Prime-eligible). This skill helps you articulate what makes your product different on each platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Close the Loop With Reviews
&lt;/h2&gt;

&lt;p&gt;You've rewritten the catalog, fixed the images, pushed to every channel. Now you need social proof. The &lt;strong&gt;&lt;a href="https://kilo.ai/kiloclaw/bytes/review-harvester" rel="noopener noreferrer"&gt;Review Loop&lt;/a&gt;&lt;/strong&gt; recipe automates the unglamorous work of collecting reviews: sends a request email a few days after delivery, monitors for new reviews across all your channels, and drafts responses for anything that needs human attention.&lt;/p&gt;

&lt;p&gt;It catches negative reviews early — before they sit unanswered for two weeks and convince 50 potential buyers to go elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make it better with ClawHub skills:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://clawhub.ai/alirezarezvani/marketing-psychology" rel="noopener noreferrer"&gt;Marketing Psychology&lt;/a&gt; skill applies behavioral science to your review request emails. Small tweaks like the timing of the ask, how you frame it, whether you reference the specific product, can meaningfully improve response rates.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Skill That Makes Everything Better Over Time
&lt;/h2&gt;

&lt;p&gt;One more ClawHub skill worth mentioning, because it applies to every step above: the &lt;a href="https://clawhub.ai/pskoett/self-improving-agent" rel="noopener noreferrer"&gt;Self-Improving Agent&lt;/a&gt;. With 355K downloads and 3,000 stars, it's the most popular skill on ClawHub for a reason.&lt;/p&gt;

&lt;p&gt;It captures learnings, errors, and corrections across sessions. When you correct a product description's tone, it remembers. When you reject a bad image edit, it learns. Over time, your entire catalog pipeline gets better without you re-explaining your preferences every session.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Full Pipeline
&lt;/h2&gt;

&lt;p&gt;Here's what the complete workflow looks like:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/seo-audit-fixer" rel="noopener noreferrer"&gt;SEO Mechanic&lt;/a&gt; finds everything that's broken&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rewrite&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/product-description-factory" rel="noopener noreferrer"&gt;Product Description Factory&lt;/a&gt; generates new copy for every SKU&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edit&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/bulk-product-editor" rel="noopener noreferrer"&gt;Bulk Product Surgeon&lt;/a&gt; handles mass changes across the catalog&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Images&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/product-image-optimizer" rel="noopener noreferrer"&gt;Image Factory&lt;/a&gt; cleans up and optimizes every product photo&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Distribute&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/multi-channel-lister" rel="noopener noreferrer"&gt;Listing Broadcaster&lt;/a&gt; pushes adapted listings to every channel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reviews&lt;/strong&gt; — &lt;a href="https://kilo.ai/kiloclaw/bytes/review-harvester" rel="noopener noreferrer"&gt;Review Loop&lt;/a&gt; collects social proof and monitors feedback&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each step works on its own. Together, they're a catalog overhaul that would have taken a team weeks, finished in an afternoon.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>ecommerce</category>
      <category>openclaw</category>
      <category>discuss</category>
    </item>
    <item>
      <title>Trinity-Large-Thinking is Free in Kilo for a Limited Time</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Wed, 08 Apr 2026 12:58:37 +0000</pubDate>
      <link>https://dev.to/kilocode/trinity-large-thinking-is-free-in-kilo-for-a-limited-time-19a</link>
      <guid>https://dev.to/kilocode/trinity-large-thinking-is-free-in-kilo-for-a-limited-time-19a</guid>
      <description>&lt;p&gt;If you have been watching the OSS space, you know that the frontier is shifting from simple chat models to complex, reasoning-heavy agents. Last week, the team at Arcee AI made a massive contribution to that shift. They officially &lt;a href="https://www.arcee.ai/blog/trinity-large-thinking" rel="noopener noreferrer"&gt;launched Trinity-Large-Thinking&lt;/a&gt;, a frontier open reasoning model built specifically for complex, long-horizon agents and multi-turn tool calling.&lt;/p&gt;

&lt;p&gt;To celebrate the release of one of the strongest open models ever released outside of China, we are thrilled to announce that &lt;strong&gt;Trinity-Large-Thinking will be completely FREE to use in Kilo Code and KiloClaw for a full week, starting today, April 6th.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I know we've been launching a lot of models lately, but we're extra excited about this powerful new release from a lesser-known US lab. It's laser fast and great at a wide range of agentic tasks.&lt;/p&gt;

&lt;p&gt;Here is a quick breakdown of why this model is a game-changer for your daily workflow, and why you should test drive it ASAP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5h8r2fe1oev7gsc0gst1.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5h8r2fe1oev7gsc0gst1.jpeg" alt="Trinity Large Thinking Benchmarks" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture: Massive Scale, Insane Efficiency
&lt;/h2&gt;

&lt;p&gt;Usually, when you hear about a 400-billion parameter model, you immediately worry about latency. Arcee solved this through architectural constraint and innovative thinking about how to optimize every part of the inference process.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sparse MoE Design:&lt;/strong&gt; Trinity-Large-Thinking is a 398B-parameter sparse Mixture-of-Experts (MoE) model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Active Parameters:&lt;/strong&gt; During inference, it activates only about 13B parameters per token.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Speed Advantage:&lt;/strong&gt; Because of this extreme sparsity, it possesses the deep knowledge of a massive system but runs roughly &lt;strong&gt;2 to 3 times faster than its peers&lt;/strong&gt; on the same hardware.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Agentic Edge: Perfect for KiloClaw
&lt;/h2&gt;

&lt;p&gt;The preview release of this model, Trinity Large Preview, has been free in Kilo for over two months and quickly rose to the top of the &lt;a href="https://openrouter.ai/apps?url=https%3A%2F%2Fkilocode.ai%2F" rel="noopener noreferrer"&gt;OpenRouter leaderboards&lt;/a&gt; for both Kilo Code (including KiloClaw) and OpenClaw.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frv30gp4vpqd5y6kqxb9k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frv30gp4vpqd5y6kqxb9k.png" alt="OpenRouter leaderboard" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The preview version of Trinity Large has been in Kilo's top 20 for over two months. (Snapshot is from past 30 days.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And that was just the &lt;em&gt;preview&lt;/em&gt;. While Trinity Large's architecture natively supports context windows up to 512k tokens, the Preview API served at 128k context using 8-bit quantization. &lt;strong&gt;Now you can use the full release for free, with a longer context that supports multiple turns.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Trinity-Large-Thinking wasn't built to ace trivia benchmarks. It was purpose-built for tool calling, multi-step planning, and agent workflows. &lt;strong&gt;This makes it an absolute monster when plugged into agentic features like &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt; (our hosted OpenClaw environment).&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://x.com/pinchbench/status/2040885242756780235?s=20" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvay4oeflnfs50s30y99v.png" alt="PinchBench results" width="800" height="731"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Why is this new Trinity model so good for agentic use cases?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native Reasoning Traces:&lt;/strong&gt; The model generates explicit reasoning traces before producing its final response.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context is Key:&lt;/strong&gt; This internal thinking process is critical to the model's performance. When running agentic loops in OpenClaw, these thinking tokens must be kept in context for multi-turn conversations to function correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Massive Memory:&lt;/strong&gt; To support these long reasoning chains across many agentic steps, the model boasts a longer extended context window. It's particularly good at multi-turn tool use, context coherence, and instruction following across long-horizon agent runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Top of the PinchBench Index
&lt;/h2&gt;

&lt;p&gt;We don't just take a lab's word for it; we look at the data. Our internal testing has found the model strong across &lt;a href="https://kilo.ai/kiloclaw/openclaw-for" rel="noopener noreferrer"&gt;OpenClaw use cases&lt;/a&gt; in KiloClaw.&lt;/p&gt;

&lt;p&gt;Arcee built this model focusing on the things that make agents feel real in practice: staying coherent across turns, using tools cleanly, and strictly following instructions.&lt;/p&gt;

&lt;p&gt;The results speak for themselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Top-Tier Performance:&lt;/strong&gt; Initial testing saw Trinity Large Thinking rise to #2 on &lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt;, a benchmark measuring model capability on tasks relevant to agents like OpenClaw.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Heavyweight Challenger:&lt;/strong&gt; It sits just behind Claude Opus-4.6 in raw agentic capability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unbeatable Economics:&lt;/strong&gt; While rivaling Opus-4.6, it lands at just $0.90 per million output tokens on Arcee's API, making it roughly &lt;strong&gt;96% cheaper&lt;/strong&gt;. (Plus it's currently free in Kilo — that's pretty affordable!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At Kilo, we believe in avoiding vendor lock-in. Arcee shares that philosophy. They release model weights on Hugging Face under the Apache 2.0 license, and this has been true for &lt;a href="https://www.arcee.ai/trinity" rel="noopener noreferrer"&gt;all of their models&lt;/a&gt;. They built Trinity Large because they believe developers and enterprises need models they can inspect, post-train, host, distill, and truly own.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqo61bm640r3bdx9yj50.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnqo61bm640r3bdx9yj50.png" alt="Arcee Apache 2.0" width="800" height="159"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Try it today in our CLI, IDE extensions, and agentic features like Kilo's &lt;a href="https://kilo.ai/features" rel="noopener noreferrer"&gt;Cloud Agents&lt;/a&gt; and &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt;. You'll be glad you did.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>coding</category>
    </item>
    <item>
      <title>PinchBench v2: Call for Contributors to the Leading OpenClaw Benchmark</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:07:30 +0000</pubDate>
      <link>https://dev.to/kilocode/pinchbench-v2-call-for-contributors-to-the-leading-openclaw-benchmark-3m4d</link>
      <guid>https://dev.to/kilocode/pinchbench-v2-call-for-contributors-to-the-leading-openclaw-benchmark-3m4d</guid>
      <description>&lt;p&gt;We're excited to announce that &lt;strong&gt;PinchBench v2&lt;/strong&gt; is now in active development --- and we're opening the doors for community contributions to help shape the next major release. 🦀&lt;/p&gt;

&lt;h2&gt;
  
  
  The Remarkable Rise of PinchBench
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt; started as a side project of Kilo DevRel mastermind &lt;a href="https://x.com/olearycrew" rel="noopener noreferrer"&gt;Brendan O'Leary&lt;/a&gt;, who wanted to build a benchmarking system for evaluating LLM models as OpenClaw coding agents. The idea was simple: run tests based on real-world tasks to help users choose the right model for their use case. But my oh my, has that "side project" taken off!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk56b2yuun552zuwb984d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk56b2yuun552zuwb984d.png" width="800" height="451"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NVIDIA GTC Keynote 2026&lt;/p&gt;

&lt;p&gt;During his recent keynote, NVIDIA CEO Jensen Huang showcased PinchBench on stage as a definitive standard for evaluating the real-world performance of OpenClaw agents. He highlighted &lt;a href="https://blog.kilo.ai/p/nvidia-nemotron-3-super-launch" rel="noopener noreferrer"&gt;Nemotron 3 Super&lt;/a&gt;'s performance as the top open-weight model for OpenClaw use cases.&lt;/p&gt;

&lt;p&gt;In the following week, MiniMax has announced that they will soon release the weights for &lt;a href="https://blog.kilo.ai/p/minimax-m27" rel="noopener noreferrer"&gt;MiniMax-M2.7&lt;/a&gt;, and Z AI has shared that the much-anticipated GLM-5.1 will also have open weights. The competition is heating up, and not just for OSS models. This is only the beginning of the agentic revolution.&lt;/p&gt;

&lt;p&gt;We need your help to make PinchBench even more useful and comprehensive. The era of generalized benchmarks is over. &lt;strong&gt;It's time for benchmarks that help you choose the best LLMs for always-on agents&lt;/strong&gt;, with a focus on specific skills that can be used around the clock in tools like &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmftch8rprlc7ohiau7b8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmftch8rprlc7ohiau7b8.png" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;NVIDIA GTC Keynote 2026 (Full Screen!)&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What We're Building&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt; v2 is a significant leap forward. Our aim is to produce a benchmark that more accurately captures the real-world complexity of agentic tasks --- including longer task horizons, better verification, and a much richer picture of model performance across a wider set of domains. As &lt;a href="https://blog.kilo.ai/p/kiloclaw-updates-persistent-packages" rel="noopener noreferrer"&gt;KiloClaw continues to lead the charge&lt;/a&gt; for hosted OpenClaw ease-of-use, functionality and security, we want to make sure that PinchBench is equally ahead of the curve.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our goal for v2 is 100 tasks&lt;/strong&gt;, and we're especially focused on testing across a wider range of OpenClaw use cases. We want contributions that reflect the kinds of tasks OpenClaw is actually being used for in practice, paired with rigorous success-rate measurement. If you're running OpenClaw in production or research contexts, you're exactly who we want to hear from.&lt;/p&gt;

&lt;p&gt;On the leaderboard side, we're investing in a substantially improved UI/UX --- better filtering, model landing pages, user profiles, per-task variance, and more --- to make results easier to understand and compare.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fee7aroy2hr1t5dgza633.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fee7aroy2hr1t5dgza633.png" width="800" height="594"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Open Call for Contributions&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The contribution window is open now through April 15th, 2026.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We are looking for two types of contributors: skills and leaderboard. You are welcome to contribute in both categories.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Skills Contributions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Help us expand and improve the task suite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;New tasks&lt;/strong&gt; --- What should OpenClaw be doing that we aren't currently measuring? We want tasks that represent real, valuable work: things a practitioner would actually run OpenClaw on, with clear and programmatically verifiable success criteria. Tasks should be relevant across both local and hosted OpenClaw instances --- including hosted services like &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt; and KimiClaw.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Task improvements&lt;/strong&gt; --- Some existing tasks fail at high rates across nearly all models, and others may not reflect the current state of what OpenClaw can do. If you can identify, fix, or replace tasks that aren't pulling their weight, we want your PR.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Success rate coverage&lt;/strong&gt; --- Contributions that include baseline success rates across multiple models are especially valuable. Help us ensure the benchmark is neither too easy nor impossibly hard at release. It's all about real-world agentic use.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Good tasks should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Realistic&lt;/strong&gt; --- something OpenClaw would genuinely be run on in a real workflow&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Clearly specified&lt;/strong&gt; --- a passing solution should unambiguously satisfy the task&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Well-calibrated in difficulty&lt;/strong&gt; --- ideally targeting a solve rate that distinguishes model capability&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Convention-compliant&lt;/strong&gt; --- all tasks must follow OpenClaw skill conventions to ensure consistency and compatibility across the benchmark&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Leaderboard Contributions&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Help us build a leaderboard that's detailed, clear, relevant and accessible.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7h1w89c54gb0oaewxc9s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7h1w89c54gb0oaewxc9s.png" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We're working through a range of UI/UX improvements for v2, including redesigned filtering and navigation, model and contributor profile pages, improved scoring to eliminate run-size bias, and daily/weekly/monthly recognition badges. If you have front-end chops and care about how benchmark results are communicated, this is where we need you.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How to Contribute&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;There are no forms to fill out. Anybody can contribute.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Review the open issues in the &lt;a href="https://github.com/pinchbench/skill/issues/60" rel="noopener noreferrer"&gt;PinchBench v2 meta issue&lt;/a&gt; to understand what's in scope&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Propose a new task or improvement in GitHub Discussions or by opening an issue --- especially for OpenClaw-specific use cases you want to see covered&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Implement your contribution by forking the repo, building it out, and submitting a PR&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Iterate with reviewers to get your contribution merged&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Recognition&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Contributors will be recognized in the v2 release in two categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Skills Contributors&lt;/strong&gt; --- recognized for accepted new tasks and task improvements, ordered by number of accepted contributions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Leaderboard Contributors&lt;/strong&gt; --- recognized for accepted UI/UX improvements to the leaderboard&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every accepted contribution counts. Whether it's one well-crafted task or a full leaderboard feature, we aim to acknowledge top community contributions in the release.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Get Involved&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GitHub: &lt;/strong&gt;&lt;a href="https://github.com/pinchbench/skill" rel="noopener noreferrer"&gt;pinchbench/skill&lt;/a&gt; --- browse open issues and the v2 meta issue&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;v2 Meta Issue: &lt;/strong&gt;&lt;a href="https://github.com/pinchbench/skill/issues/60" rel="noopener noreferrer"&gt;#60&lt;/a&gt; --- the full list of what's in scope for this release&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;PinchBench is a community project, and v2 will be shaped by the people who contribute to it. We'd love your help in improving the definitive benchmark for OpenClaw use cases. Learn more about &lt;a href="https://pinchbench.com/about" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt; and &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>contributorswanted</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>The Cost of Always-On Agents is Less Than You Might Think</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:04:26 +0000</pubDate>
      <link>https://dev.to/kilocode/the-cost-of-always-on-agents-is-less-than-you-might-think-ho4</link>
      <guid>https://dev.to/kilocode/the-cost-of-always-on-agents-is-less-than-you-might-think-ho4</guid>
      <description>&lt;p&gt;There's a growing assumption in AI right now:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If agents are always running, costs will spiral.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This sounds reasonable. More autonomy should mean more tokens and more compute. More tokens and more compute should mean higher bills.&lt;/p&gt;

&lt;p&gt;But that mental model is already breaking. Why? Because it assumes you're paying for &lt;strong&gt;outputs&lt;/strong&gt;---individual prompts and responses.&lt;/p&gt;

&lt;p&gt;In reality, with new agentic systems like OpenClaw, you're paying for something very different:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Ongoing throughput---work completed over time.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once we understand that shift---the move from prompts and specific outputs to a model that focuses on ongoing throughput and persistent memory---the economics start to look completely different.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c9e4qgz6ytbn5u9pt7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3c9e4qgz6ytbn5u9pt7n.png" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Outdated Way to Think About Cost&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Most teams still evaluate AI like an API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Cost per token&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost per request&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost per response&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That might work for chat, but it fails for agents. Agents don't just respond once. Instead, they plan, break work into steps, execute across tools, revisit and improve outputs, and (if everything is working correctly) they continue operating after the initial trigger.&lt;/p&gt;

&lt;p&gt;So the real question isn't "how much does this prompt cost?" but &lt;strong&gt;"how much useful work can I get done for a small amount of money?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxflirgbtcm5t5sdkbx7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxflirgbtcm5t5sdkbx7.png" width="800" height="484"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/?view=cost" rel="noopener noreferrer"&gt;Filtering by cost&lt;/a&gt; in PinchBench&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What the Data Actually Shows&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Benchmarks like PinchBench measure something more meaningful than tokens: &lt;strong&gt;cost per completed agent task.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here's a snapshot of &lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;current value rankings&lt;/a&gt;. A few things jump out immediately:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;High-value models like Opus complete full tasks for &lt;strong&gt;$0.03--$0.13&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Even strong mid-tier models like Kimi K2.5 stay well under &lt;strong&gt;$0.50 per task&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Average*&lt;em&gt; success rates&lt;/em&gt;* cluster surprisingly close (65--85%) despite major cost differences&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This leads to a non-obvious conclusion:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You're often paying &lt;strong&gt;10--20x more&lt;/strong&gt; for marginal gains in quality.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudeedh92bov3fp5mknkh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fudeedh92bov3fp5mknkh.png" width="800" height="666"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;Filtering by success rate&lt;/a&gt; in PinchBench&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What $10 Gets You in OpenClaw&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We often have free models like &lt;a href="https://kilo.ai/leaderboard" rel="noopener noreferrer"&gt;Nemotron 3 Super, Trinity Large Preview and MiMo-V2-Pro&lt;/a&gt; available in Kilo, but even if you're opting for paid models, you can get a LOT for $10. A ten-spot will buy you a lot more than 10 turns in your agent chat.&lt;/p&gt;

&lt;p&gt;Let's translate those numbers into something real.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Without Agents: Linear Output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;If you're coding or prompting manually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;You rely on frontier models&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You resend context every time&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;You manually trigger every step&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Work stops when you stop&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;$10 gets you around 2--4 meaningful tasks. &lt;/strong&gt;Then it's on to the next project.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;With KiloClaw: Compounding Output&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With a hosted OpenClaw agent like KiloClaw, that same $10 is distributed across a system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;sub-agents handling different responsibilities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;multiple model tiers with different costs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;cached context reused across runs&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;scheduled execution instead of constant prompting&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In KiloClaw, &lt;strong&gt;$10 gets you around 20--150+ agent task executions.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Of course there's some variance depending on which &lt;a href="https://kilo.ai/kiloclaw/bytes" rel="noopener noreferrer"&gt;tasks and skills&lt;/a&gt; you're focused on. But still. This is huge. And it's honestly a lot more than we were expecting when we started spinning up claws.&lt;/p&gt;

&lt;p&gt;More importantly, &lt;em&gt;the system keeps working after you stop&lt;/em&gt;. Sub-agents reduce waste, memory persists, and &lt;strong&gt;auto&lt;/strong&gt; &lt;strong&gt;model routing can further decrease costs by 5-10x&lt;/strong&gt;. Most agentic tasks don't actually need the "best" model. With auto routing now available in different modes in Kilo, including in KiloClaw, you can pick a mode during onboarding and update at any time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0bhdbfe0wy48yn7hlfzi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0bhdbfe0wy48yn7hlfzi.png" width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Current Kilo Auto Modes. Models and modes subject to change!&lt;/p&gt;

&lt;p&gt;Looking to take advantage of high-efficiency models but super powerful models like &lt;a href="https://blog.kilo.ai/p/what-we-learned-from-a-week-of-free" rel="noopener noreferrer"&gt;Kimi K2.5&lt;/a&gt; and &lt;a href="https://blog.kilo.ai/p/we-tested-minimax-m27-against-claude" rel="noopener noreferrer"&gt;MiniMax M2.7&lt;/a&gt;? Choose &lt;strong&gt;Balanced Mode&lt;/strong&gt; and we'll route between models for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why "Agentic Engineering" Was Inevitable&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;This isn't just a cost story. It's a shift in how software gets built, whether that's full production software for a new startup or your own personal AI assistant with something like &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We're entering the era of &lt;strong&gt;agentic engineering&lt;/strong&gt;---where multiple agents collaborate across planning, implementation, debugging, and deployment.&lt;/p&gt;

&lt;p&gt;This isn't hype. It's already happening:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Code gets written, reviewed, and deployed in a single loop&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Long-running tasks move into persistent cloud agents&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Developers supervise systems instead of executing every step&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The role of the developer is changing---from builder to orchestrator. And with OpenClaw the role of everyday AI users is changing too---from consumer to conductor.&lt;/p&gt;

&lt;p&gt;And once that happens, cost behaves differently. Efficiency is no longer about a single request---it's about how well the system runs over time.&lt;/p&gt;

&lt;p&gt;Platforms that unify this workflow---IDE, CLI, cloud, and collaboration---don't just improve productivity. They become the default interface for building software. This is what we've been building at Kilo since the beginning, and the rise of KiloClaw is just the next phase of this (very fast) evolution.&lt;/p&gt;

&lt;p&gt;Check out &lt;a href="https://pinchbench.com/" rel="noopener noreferrer"&gt;PinchBench&lt;/a&gt; for the best OpenClaw benchmarks, and &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;launch your own claw &lt;/a&gt;in minutes with Kilo! 🦀&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>ai</category>
      <category>agents</category>
      <category>discuss</category>
    </item>
    <item>
      <title>We Tested MiniMax M2.7 Against Claude Opus 4.6</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Wed, 25 Mar 2026 08:11:33 +0000</pubDate>
      <link>https://dev.to/kilocode/we-tested-minimax-m27-against-claude-opus-46-1ii9</link>
      <guid>https://dev.to/kilocode/we-tested-minimax-m27-against-claude-opus-46-1ii9</guid>
      <description>&lt;p&gt;&lt;a href="https://www.minimax.io/models/text/m27" rel="noopener noreferrer"&gt;MiniMax M2.7&lt;/a&gt; launched on March 18 scoring 56.22% on SWE-Pro, close to Claude Opus 4.6. We ran both models through three coding tasks in &lt;a href="https://kilocode.ai/" rel="noopener noreferrer"&gt;Kilo Code&lt;/a&gt; to see if the benchmark numbers hold up in practice. On pricing, MiniMax M2.7 runs at $0.30/$1.20 per million tokens (input/output) compared to Claude Opus 4.6's $5/$25, roughly a &lt;strong&gt;17x difference on input and 21x on output&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%21emdv%21%2Cw_1456%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252Ff61f6e60-9bc5-4d4d-8f85-3bd602ff54cc_3000x1490.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsubstackcdn.com%2Fimage%2Ffetch%2F%24s_%21emdv%21%2Cw_1456%2Cc_limit%2Cf_auto%2Cq_auto%3Agood%2Cfl_progressive%3Asteep%2Fhttps%253A%252F%252Fsubstack-post-media.s3.amazonaws.com%252Fpublic%252Fimages%252Ff61f6e60-9bc5-4d4d-8f85-3bd602ff54cc_3000x1490.jpeg" title="Value Icon" alt="Value Icon" width="800" height="397"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Both models found &lt;strong&gt;all 6 bugs and all 10 security vulnerabilities&lt;/strong&gt; in our tests. Claude Opus 4.6 produced more thorough fixes and 2x more tests. MiniMax M2.7 delivered &lt;strong&gt;90% of the quality for 7% of the cost&lt;/strong&gt; ($0.27 total vs $3.67).&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Test Design&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We created three TypeScript codebases and ran both models in Code mode in &lt;a href="https://kilocode.ai/" rel="noopener noreferrer"&gt;Kilo Code&lt;/a&gt; for VS Code. Each model received the same prompt with no hints. We scored each model independently after all tests were complete.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test 1: Full-Stack Event Processing System (35 points)&lt;/strong&gt; - Build a complete system from a spec, including async pipeline, WebSocket streaming, and rate limiting&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test 2: Bug Investigation from Symptoms (30 points)&lt;/strong&gt; - Trace 6 bugs from production log output to root causes and fix them&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test 3: Security Audit (35 points)&lt;/strong&gt; - Find and fix 10 planted security vulnerabilities across a team collaboration API&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Test 1: Full-Stack Event Processing System&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We gave both models this prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Build a real-time event processing system in TypeScript from the specification in &lt;a class="mentioned-user" href="https://dev.to/spec"&gt;@spec&lt;/a&gt;.md. Use Hono for the web framework, Prisma with SQLite for the database, Zod for input validation, and ws for WebSocket support."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The spec required 7 components: event ingestion API with API key auth, async processing pipeline with exponential backoff retry, event storage with processing history, query API with pagination and filtering, WebSocket endpoint for live streaming, per-key rate limiting, and health/metrics endpoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2kvoye9dqs2c9csvi9b6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2kvoye9dqs2c9csvi9b6.png" width="800" height="207"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models implemented all 7 components. The score difference came from code organization and test coverage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77xum11232j9x0urywi7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77xum11232j9x0urywi7.png" width="800" height="591"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Architecture&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Claude Opus 4.6 created a modular directory structure with separate directories for routes, pipeline, middleware, and WebSocket management. It split the processing logic into separate files for queue management (with retry scheduling and dead-letter routing) and per-type event handlers. It also included graceful shutdown with timer cleanup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3lvr4yh983l2shmvxzf9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3lvr4yh983l2shmvxzf9.png" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MiniMax M2.7 used a flatter structure with fewer files. All routing lived in a single entry file, and the processor was simpler with no shutdown management or timer tracking.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62bjlbqzkq7mvm83yovy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F62bjlbqzkq7mvm83yovy.png" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Test Coverage&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Claude Opus 4.6 wrote &lt;strong&gt;41 integration tests&lt;/strong&gt; with a dedicated test database and proper cleanup between tests. The tests make actual HTTP requests against the API, testing the full middleware chain end-to-end.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uufcyn7orra8mhf2mhp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8uufcyn7orra8mhf2mhp.png" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MiniMax M2.7 wrote &lt;strong&gt;20 unit tests&lt;/strong&gt; that validate Zod schemas and handler functions directly. These cover the core logic, but don't test the API endpoints or middleware through HTTP, so routing or middleware bugs would slip through.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Test 1 Scoring&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyz1f7sm02wg13ef6om9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwyz1f7sm02wg13ef6om9.png" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Claude Opus 4.6 lost 2 points for not generating a README (the spec asked for one). MiniMax M2.7 generated a README but lost points on architecture and test coverage.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Test 2: Bug Investigation from Symptoms&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We built an order processing system with 4 interconnected modules (gateway, orders, inventory, notifications) and planted 6 bugs. We gave both models the codebase, a production log file showing symptoms, and a memory profile showing growth data. The prompt listed the 6 symptoms and asked both models to investigate, find root causes, and fix them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujbhda7qbl5zo6na0pif.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujbhda7qbl5zo6na0pif.png" width="800" height="190"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models found all 6 root causes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6iaduf67jpl5qg602a9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo6iaduf67jpl5qg602a9.png" width="800" height="576"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Bug #1: Race Condition in Inventory&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Stock was checked first, then reserved in a separate transaction. Two concurrent orders could both pass the check before either reserved. Both models identified this from the logs and fixed it by making the reservation atomic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; also added &lt;strong&gt;rollback logic&lt;/strong&gt;. If reserving stock for one item in a multi-item order fails, it releases the items that already succeeded and marks the order as "failed." &lt;strong&gt;MiniMax M2.7&lt;/strong&gt; made the reservation atomic but &lt;strong&gt;didn't add rollback&lt;/strong&gt;, so partial failures can leave orphaned reservations.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Bug #4: Floating-Point Totals&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The order total calculation used standard floating-point arithmetic, which produces results like &lt;code&gt;159.92000000000002&lt;/code&gt; for certain price and quantity combinations. The logs showed repeated "Total validation warning" entries where the expected and calculated totals differed by tiny fractions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude Opus 4.6&lt;/strong&gt; rounded the result after calculation:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faoudpir0fdlegazkhi14.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faoudpir0fdlegazkhi14.png" width="800" height="130"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MiniMax M2.7&lt;/strong&gt; converted to integer math (cents), avoiding the precision problem entirely:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s89y5yyau96uvm8yofj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s89y5yyau96uvm8yofj.png" width="800" height="181"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MiniMax M2.7's approach is technically better here. Working in cents avoids accumulation errors that rounding after the fact can miss on large orders.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Remaining Bugs&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Both models fixed the other 4 bugs with the same approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Notification ordering (Bug #2)&lt;/strong&gt;: Added a status check before sending confirmation emails, skipping orders that were already cancelled&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory leak (Bug #3)&lt;/strong&gt;: Removed a per-order event listener that was never cleaned up, accumulating with each request (the memory profile showed listener count tracking 1:1 with request count)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Stale inventory cache (Bug #5)&lt;/strong&gt;: Added cache invalidation calls after stock updates, so the 60-second cache TTL no longer serves stale data&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Token revocation bypass (Bug #6)&lt;/strong&gt;: Removed a "5-minute optimization" that skipped the revocation check for fresh tokens&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Test 2 Scoring&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhib87faubskwwox5qxjc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhib87faubskwwox5qxjc.png" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models verified their fixes by running curl requests against the server. Claude Opus 4.6 explicitly referenced log entries when explaining each bug, while MiniMax M2.7 jumped more directly to the code.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Test 3: Security Audit&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;We built a team collaboration API (Hono + Prisma + SQLite) with 10 planted security vulnerabilities. We asked both models to audit the codebase, categorize each vulnerability by OWASP, explain the attack vector, rate severity, and implement fixes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpoj90n8xocvoozh1ypxp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpoj90n8xocvoozh1ypxp.png" width="800" height="239"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both models found all 10 vulnerabilities with correct OWASP categorizations. The 4-point gap is entirely in fix quality.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6rotps5o5me4x5bhz4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe6rotps5o5me4x5bhz4r.png" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Where the Fixes Diverged&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Password hashing&lt;/strong&gt;: Claude Opus 4.6 used scrypt with random salts and timing-safe comparison. MiniMax M2.7 used SHA-256 with the JWT secret as the salt, and flagged in its own output that bcrypt would be better.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Insecure deserialization&lt;/strong&gt;: Both removed the &lt;code&gt;eval()&lt;/code&gt; on webhook transforms. Claude Opus 4.6 replaced it with a safe JSON key-mapping system. MiniMax M2.7 disabled transforms entirely.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SSRF protection&lt;/strong&gt;: Claude Opus 4.6 validated webhook URLs at creation, update, and delivery. MiniMax M2.7 validated at delivery only.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rate limiting&lt;/strong&gt;: Claude Opus 4.6 applied per-endpoint limits (login, register, password reset). MiniMax M2.7 only rate-limited the login endpoint.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JWT fix&lt;/strong&gt;: Both moved the hardcoded secret to an environment variable. Claude Opus 4.6 let &lt;code&gt;jwt.verify()&lt;/code&gt; handle expiration natively. MiniMax M2.7 fixed the broken manual comparison, which works but duplicates built-in functionality.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Test 3 Scoring&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnz6smk55y9ran7r9xkg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvnz6smk55y9ran7r9xkg.png" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Overall Results&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9eujwor18p5q1s46nojo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9eujwor18p5q1s46nojo.png" width="800" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Bigger Picture&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We've been testing MiniMax models since M2 last November. Earlier versions competed against other open-weight models like GLM 4.7 and GLM-5. With each release, the scores climbed and the cost stayed low. MiniMax M2.5 (the previous version) is currently the #1 most-used model across every mode in Kilo Code, ahead of Claude Opus 4.6, GLM-5, and GPT-5.4. In Code mode it accounts for 37% of all usage. In Ask mode, 35%.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fib5suxlnvt5jkt0a0ytv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fib5suxlnvt5jkt0a0ytv.png" width="800" height="357"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;MiniMax M2.5 usage across Kilo Code modes&lt;/p&gt;

&lt;p&gt;MiniMax M2.7 is the first version where we felt the right comparison was a frontier model rather than another open-weight one. It matched Claude Opus 4.6's detection rate on every test in this benchmark, finding the same bugs and the same vulnerabilities. The fixes aren't as thorough yet, but the diagnostic gap between open-weight and frontier models is shrinking with every release.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Takeaways&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;For building from scratch&lt;/strong&gt;: Claude Opus 4.6 produced 41 integration tests and a modular architecture. MiniMax M2.7 built the same features with 20 unit tests and a flatter structure, at $0.13 vs $1.49.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For debugging&lt;/strong&gt;: Both models found all 6 root causes from log symptoms. MiniMax M2.7 even produced a better fix for the floating-point bug. Claude Opus 4.6 added rollback logic that MiniMax M2.7 missed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For security work&lt;/strong&gt;: Both models found all 10 vulnerabilities. Claude Opus 4.6's fixes are closer to what you'd ship (proper key derivation, feature-preserving alternatives, defense-in-depth). MiniMax M2.7 closes the same vulnerabilities with simpler approaches and sometimes flags its own shortcuts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;On cost&lt;/strong&gt;: $3.67 total for Claude Opus 4.6 vs $0.27 for MiniMax M2.7. Detection was identical. The gap is in how thorough the fixes are.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>coding</category>
      <category>llm</category>
      <category>testing</category>
    </item>
    <item>
      <title>Talk to the Claw: The Interface Is Now a Single Sentence</title>
      <dc:creator>Darko from Kilo</dc:creator>
      <pubDate>Fri, 20 Mar 2026 09:00:00 +0000</pubDate>
      <link>https://dev.to/kilocode/talk-to-the-claw-the-interface-is-now-a-single-sentence-h3</link>
      <guid>https://dev.to/kilocode/talk-to-the-claw-the-interface-is-now-a-single-sentence-h3</guid>
      <description>&lt;p&gt;We hear it a lot these days, but what does it actually mean for software to have a "new interface"?&lt;/p&gt;

&lt;p&gt;At Kilo, we aren't approaching this question in the abstract---we're living it every day.&lt;/p&gt;

&lt;p&gt;As we lean into agentic flows, we're discovering that working in a new interface means that the layer between you and the tool is no longer a dashboard, a form, or a button. &lt;strong&gt;It's a sentence.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You will still hear people talk about UX improvements. Better navigation. Cleaner design. More intuitive onboarding flows. It will be framed as progress.&lt;/p&gt;

&lt;p&gt;But the real change runs deeper than any redesign. The interface layer is decoupling from the application layer entirely. You don't need to know where the button is. You don't need to learn the menu structure. You just say what you need done.&lt;/p&gt;

&lt;p&gt;Natural language &lt;em&gt;is&lt;/em&gt; the new UI.&lt;/p&gt;

&lt;p&gt;A couple of things I'm not saying:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;I'm not saying every app will disappear.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I'm not saying this works perfectly today for every use case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I'm not saying you should throw away your existing workflows.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's what I &lt;em&gt;am&lt;/em&gt; saying: the apps you already use didn't have to rebuild themselves from scratch for this to be true. &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw&lt;/a&gt; can talk to &lt;a href="https://www.todoist.com/" rel="noopener noreferrer"&gt;Todoist&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; &lt;a href="https://linear.app/" rel="noopener noreferrer"&gt;Linear&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; your calendar &lt;em&gt;and&lt;/em&gt; your inbox -- through the same window, using the same language you'd use to text a colleague. You don't have to live inside each one to operate them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr114rkdesbwepqs820jo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr114rkdesbwepqs820jo.png" width="800" height="745"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Credit: Todoist&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This isn't about saving five minutes. It's about a bigger shift. The way we interact with software is fundamentally changing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Twelve Tools, One Front Door
&lt;/h2&gt;

&lt;p&gt;Here's where the new interface really shines.&lt;/p&gt;

&lt;p&gt;Last week, I had a new project land in my inbox. I downloaded the PDF, uploaded it to my &lt;a href="https://kilo.ai/kiloclaw" rel="noopener noreferrer"&gt;KiloClaw bot on Telegram&lt;/a&gt;, and typed a simple prompt in natural language, essentially: &lt;em&gt;Create a Todoist project for this and add the tasks based on these guidelines.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's it. No excessive bulleted lists. No diagrams. No long paragraphs discussing the background and goals for this project. Just a couple of sentences.&lt;/p&gt;

&lt;p&gt;Thirty seconds later, it was done.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu0f1gun6xa2fqy4j703.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffu0f1gun6xa2fqy4j703.png" width="800" height="793"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the same thing with scheduling.&lt;/p&gt;

&lt;p&gt;I was meeting with a friend and colleague and we agreed to sync again the following week. We both pulled up our calendars, found a time. I sent a message to KiloClaw. My contact received a calendar invite a minute later.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj047b40ykwlw364v6if.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqj047b40ykwlw364v6if.png" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Two different tools. Two different workflows. One conversation.&lt;/p&gt;

&lt;p&gt;Here's the thing: Todoist actually has a feature for this. It's called Ramble -- you can talk to it, describe your project, and it populates tasks for you. That's cool. But that's not the unlock I'm talking about.&lt;/p&gt;

&lt;p&gt;I'm the kind of person who has a different tool for everything. Todoist for tasks. GitHub for engineering projects. &lt;a href="https://kilo.ai/slack" rel="noopener noreferrer"&gt;Slack for team communication&lt;/a&gt;. Gmail for email. Each tool lives in its own silo, with its own interface, its own learning curve, its own quirks.&lt;/p&gt;

&lt;p&gt;The problem has never been the tools.&lt;/p&gt;

&lt;p&gt;The problem is the twelve different front doors. With a unified interface that acts on natural language, we now have a single way into the house.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The New Interface is the Front Door We Always Needed&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Count the apps you opened before lunch today.&lt;/p&gt;

&lt;p&gt;Email. Slack. Calendar. Linear. Todoist.&lt;/p&gt;

&lt;p&gt;They're all like different doors into your life, each with its own login, its own layout, its own way of asking you to do the same basic thing: move information from your head into the right place.&lt;/p&gt;

&lt;p&gt;That tax -- the constant context-switching, the re-orienting, the "where does this live?" -- is so familiar that most of us stopped noticing it.&lt;/p&gt;

&lt;p&gt;We got so used to micro context-switching that we forgot there could be a better way.&lt;/p&gt;

&lt;p&gt;Curious?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's what I recommend you to do to get started: &lt;/strong&gt;Start with one workflow you do repeatedly. Something tedious. Something where you're just copying information from one place to another. Tell &lt;a href="https://blog.kilo.ai/p/open-claw-is-my-intern" rel="noopener noreferrer"&gt;your bot&lt;/a&gt; to do it instead.&lt;/p&gt;

&lt;p&gt;You might be surprised how short the conversation needs to be.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
