DEV Community

Cover image for mnemospark e2e testing: OpenClaw in Action
Woodrow Brown
Woodrow Brown

Posted on

mnemospark e2e testing: OpenClaw in Action

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Challenge.

What I Built

While I was tightening up mnemospark inside OpenClaw, I kept running into the same practical truth: proving the command surface exists is not the same thing as proving the workflow is real.

First a short primer on mnemospark so you can grok the context.

I built an OpenClaw plugin that gives agents secure access to cloud storage paid via x402 with USDC on Base. It gives agents a safe place to store, retrieve, and protect important files without human hand-holding. Instead of having to juggle file backup and restoration commands manually, an OpenClaw agent can backup files and download them when needed. It is a simple, secure file storage workflow made for an AI-first world. Check out the repo to learn more: https://github.com/pawlsclick/mnemospark

A backup plugin is only as trustworthy as its full operational path. It is not enough to show that backup returns an object ID. The real question is whether the system can carry a file all the way through the storage lifecycle without hand-waving, special pleading, or hidden operator rescue in the middle.

That is the practical origin of my mnemospark end-to-end testing loop.

The loop is intentionally simple:

  1. backup
  2. price-storage
  3. upload
  4. ls
  5. download
  6. delete

If the release can survive that sequence against the real stack, it is behaving like a storage system. If it cannot, then the failure is not theoretical, it is already in the user path.

So I started using that flow as the release gate for mnemospark work inside OpenClaw itself. Why not have my OpenClaw agent test my OpenClaw plugin for me end-to-end on every release? Same file, same sequence, same standard every time. Either it clears the whole path or it does not.

How I Used OpenClaw

While running those tests, another issue became obvious.

OpenClaw’s main agent session was not the right execution home for mnemospark shell work.

The problem was not that OpenClaw could not run the commands. The problem was that the execution model was too indirect and too brittle for a plugin workflow that depends on repeated Node-based CLI invocations. I saw exactly what that brittleness looks like: allowlist misses on the main agent, approval churn in the wrong place, and a system that technically knew how to execute mnemospark while still failing to behave like a dependable operator.

So I made a different call.

Instead of forcing mnemospark through the generic main-agent path, we, myself and my OpenClaw release testing agent Joe, built a dedicated mnemospark agent for interactive workflows, separate from the existing mnemospark-renewal agent used for renewal cron execution.

That separation matters.

The renewal agent has one job: settle scheduled renewals. It is narrow by design.

The interactive agent has a different job entirely: wallet inspection, backup, pricing, upload, list, download, delete, and workflow troubleshooting. That path is broader, more conversational, and more sensitive to agent execution behavior. Trying to collapse those roles into one generic path made the system noisier and less predictable than it needed to be.

So we formalized the split:

  • mnemospark-renewal for cron-driven renewal execution
  • mnemospark for interactive and manual storage workflows

We also gave the interactive agent the shape it actually needed:

  • deny: ["subagents"] to keep execution local and predictable
  • exec.ask: "off" to prevent avoidable approval churn
  • /usr/bin/node explicitly allowlisted in exec-approvals.json

That got us part of the way there, but not all the way.

The next lesson was even more operational: defining a dedicated agent in config is not enough by itself. The system also has to route work through that agent intentionally.

That became the critical rule:

openclaw agent --agent mnemospark --message "..."

Without that routing step, mnemospark commands could still end up effectively running through the main-agent execution path and inherit all the friction we were trying to get rid of. With that routing step, the dedicated agent policy actually applies.

That is the point where the model stopped being “OpenClaw knows mnemospark exists” and became “OpenClaw knows how to operate mnemospark reliably.”

Once that was in place, the end-to-end tests changed character.

The stack became boring in the best possible way.

We were able to run repeated full-lifecycle tests through the dedicated agent and get clean results across the whole path:

  • backup succeeded
  • price-storage returned a quote
  • upload completed
  • ls reflected the uploaded object and renewal metadata
  • download returned the artifact
  • delete removed the object and cleaned up the cron state

Just as important, those successful runs no longer needed events.jsonl as a rescue tool for every step. The event log remained the source of truth when output clipping or lookup inconsistency appeared, but once the dedicated routing path was working correctly, the normal operator experience became much cleaner.

That matters because it draws a bright line between two kinds of engineering:

  • a system that can be made to work by someone who already knows where the bodies are buried
  • a system that works cleanly enough for another agent, another operator, or a future release to inherit without folklore

We wanted the second one.

That is also why the work did not stop at config.

We updated the runbook so it described the real operational shape of the system, not the idealized one. We updated the skill so another agent reading it would learn the same hard-earned lesson: mnemospark is not just a list of commands, it is a workflow with a correct routing model inside OpenClaw.

In practice, that meant teaching three things clearly:

  1. the wallet command is part of the real operator surface
  2. events.jsonl is the fallback source of truth when output is clipped
  3. interactive mnemospark work should be routed through the dedicated mnemospark agent, not improvised from the main agent session

That combination, more than any single command success, is what made the workflow trustworthy.

Demo

In the demo, I show the release-testing loop against the real mnemospark stack:

  1. start with the same test file used for release validation
  2. run backup
  3. run price-storage
  4. run upload
  5. run ls and verify the uploaded object plus renewal metadata
  6. run download
  7. compare the downloaded artifact against the original
  8. run delete
  9. confirm cleanup of the object and cron state

That is the standard I now use for mnemospark feature development inside OpenClaw. The goal is not to prove a command exists. The goal is to prove the full operator path works end to end.

What I Learned

The useful outcome is not just that mnemospark can back up a directory.

The useful outcome is that I now have a repeatable end-to-end test loop and a dedicated agent execution model that matches how the plugin actually behaves in the field.

The commands matter. The files matter. The signatures, approvals, and routing rules all matter.

But the larger point is simpler: if an agent-based storage workflow is going to be credible, it has to be tested as a full system and it has to have an execution path designed for the job instead of borrowed from a generic chat loop.

That is what I built for mnemospark.

Top comments (0)