Ana Julia Bittencourt

Posted on Mar 10 • Originally published at blog.memoclaw.com

Testing agent memory in development — patterns for dev vs. prod workflows

#ai #testing #devops #tutorial

You're building an agent with MemoClaw. You store test memories, try things out, break stuff, fix it. Then you realize your production namespace is full of garbage like "test memory please ignore" and "asdfasdf".

Here's how to keep dev and prod separate without overthinking it.

Namespaces are your best friend

MemoClaw namespaces are free to create and free to query. Use them:

# Development
memoclaw store "test user prefers dark mode" \
  --namespace myapp-dev \
  --tags test,preferences

# Production
memoclaw store "user prefers dark mode" \
  --namespace myapp-prod \
  --tags preferences

Your agent's config should set the namespace based on environment. In an OpenClaw skill or script:

# Set namespace based on environment
NS="${MEMOCLAW_NAMESPACE:-myapp-dev}"
memoclaw recall "user preferences" --namespace "$NS"

In dev, you get dev memories. In prod, you get prod ones. No cross-contamination.

Wiping test data

After a round of testing, clean up. The purge command deletes all memories in a namespace, and it's free:

# List what's in your dev namespace
memoclaw list --namespace myapp-dev

# Nuke it all
memoclaw purge --namespace myapp-dev --force

Or if you only want to delete specific test memories:

memoclaw bulk-delete mem_abc123 mem_def456 mem_ghi789

None of this costs anything. Delete, bulk-delete, list, and purge are all free endpoints.

Testing store + recall round-trips

The most basic test: can your agent store something and find it again? Here's a simple script:

#!/bin/bash
# test-memory-roundtrip.sh
NS="myapp-test-$(date +%s)"

# Store a memory
memoclaw store "the API server runs on port 3000" \
  --namespace "$NS" \
  --tags test

# Give embeddings a moment to index
sleep 2

# Recall it
RESULT=$(memoclaw recall "what port does the API use" \
  --namespace "$NS" --limit 1 --json)

# Check the result
echo "$RESULT" | jq '.memories[0].content'

# Clean up
memoclaw purge --namespace "$NS" --force

echo "Round-trip test complete"

Using a timestamped namespace means every test run is isolated. No leftover state from previous runs.

Snapshot and restore with export/import

Before running a big batch operation or consolidation, snapshot your current state:

# Export everything in a namespace
memoclaw export --namespace myapp-prod > backup-2026-03-10.json

# Count what you've got
cat backup-2026-03-10.json | jq '.memories | length'

Export is free. Run it before anything destructive.

If something goes wrong, you can restore from the backup:

memoclaw import --namespace myapp-prod < backup-2026-03-10.json

Import uses store operations under the hood, so each memory costs $0.005. For 100 memories, that's $0.50. Cheap insurance.

Integration tests for agent workflows

If your agent has memory-dependent behavior (like remembering user preferences or recalling project context), write tests that verify the full flow:

#!/bin/bash
# test-agent-preferences.sh
NS="integration-test-$(date +%s)"
FAILED=0

# Setup: store known preferences
memoclaw store "user prefers TypeScript over JavaScript" \
  --namespace "$NS" --importance 0.9
memoclaw store "user works on a Next.js project called Acme" \
  --namespace "$NS" --importance 0.7

sleep 2

# Test 1: recall finds the language preference
LANG_PREF=$(memoclaw recall "programming language preference" \
  --namespace "$NS" --limit 1 --json | jq -r '.memories[0].content')

if [[ "$LANG_PREF" == *"TypeScript"* ]]; then
  echo "PASS: language preference found"
else
  echo "FAIL: expected TypeScript, got: $LANG_PREF"
  FAILED=1
fi

# Test 2: namespace isolation - different namespace returns nothing
OTHER=$(memoclaw recall "programming language" \
  --namespace "wrong-namespace" --limit 1 --json | jq '.memories | length')

if [[ "$OTHER" == "0" ]]; then
  echo "PASS: namespace isolation works"
else
  echo "FAIL: leaked across namespaces"
  FAILED=1
fi

# Cleanup
memoclaw purge --namespace "$NS" --force

exit $FAILED

These tests use paid calls (store + recall), so each full test run costs a few cents. Worth it for confidence that your memory layer actually works.

A reasonable dev workflow

Local dev: use myapp-dev namespace. Store freely, purge often.
CI/testing: use timestamped namespaces. Automatic cleanup after each run.
Staging: use myapp-staging. Periodically mirror a subset of prod data with export/import.
Production: use myapp-prod. Export backups before any bulk operations.

The free tier (100 calls) is usually enough for a dev cycle. If you burn through it during heavy testing, store + recall cost $0.005 each. A full integration test suite that runs 20 store/recall pairs costs $0.20.

Checking your free tier budget

Before running a bunch of tests, check where you stand:

# See your account status and remaining free calls
memoclaw status

# Check memory statistics per namespace
memoclaw stats

Both commands are free. status shows your free tier usage and wallet info. stats breaks down memory counts across namespaces. Good habit before a heavy test session.

Putting it together

Here's a minimal Makefile for an OpenClaw project with memory tests:

NAMESPACE ?= dev

test-memory:
    MEMOCLAW_NAMESPACE=test-$$(date +%s) bash ./test-memory.sh

backup:
    memoclaw export --namespace $(NAMESPACE) > backups/$(NAMESPACE)-$$(date +%Y%m%d).json

promote:
    memoclaw export --namespace dev | \
        jq '[.[] | select(.importance >= 0.8)]' > /tmp/promote.json
    @echo "Promoting $$(cat /tmp/promote.json | jq 'length') memories to prod"
    memoclaw import --namespace prod < /tmp/promote.json

clean-dev:
    memoclaw purge --namespace dev --force

Then make test-memory before commits, make backup before risky changes, make promote when dev memories are production-ready.

Things I learned the hard way

Don't test against your production namespace. Obvious, but when you're moving fast at 2am, you'll forget.

Timestamped namespaces beat shared dev namespaces. Two developers testing against myapp-dev will pollute each other's results. Use myapp-dev-alice or myapp-dev-$(date +%s).

Export before consolidate. The consolidate command merges similar memories, and there's no undo. Always export first.

Text search for debugging, recall for testing. When you're checking whether a memory was stored correctly, use memoclaw search (free, keyword-based). Save recall (paid) for testing actual semantic search behavior.

Treat agent memory like you treat a database. Because that's what it is.

DEV Community