Your AI agent should get better every time you talk to it. MetaClaw makes this happen — it's an open-source framework that meta-learns from your real conversations and automatically evolves your agent. Their technical report hit #1 on HuggingFace Daily Papers.
I've been using it for a few weeks and loved the concept, but the RL training was locked to cloud backends (Tinker/MinT). I wanted to train on my own GPUs — for privacy, cost, and flexibility. So I forked it and built what I needed.
What I Built
GitHub: OctoClaws/MetaClaw
Landing Page: octoclaws.github.io/MetaClaw
1. Self-Hosted GPU Training Backend
The biggest addition. A complete self-hosted alternative to cloud training:
- FastAPI training server with PEFT/LoRA engine
- vLLM inference with LoRA hot-swap (swap LoRA adapters without reloading the base model)
- 3 loss functions: importance sampling, PPO, CISPO
- Bearer token authentication + checkpoint save/load
-
OpenAI-compatible
/v1/chat/completionsendpoint on the training server
Configuration is dead simple:
rl:
backend: remote
remote_url: http://your-gpu-server:8000
remote_api_key: your-secret-key
Tested end-to-end on 8×A100-SXM4-80GB with Qwen3-8B.
2. Per-Agent Multi-Agent Isolation
If you run multiple agents through one MetaClaw instance, skills used to bleed across agents. I built full isolation:
-
Per-agent skill directories — each agent stores/retrieves skills independently, with a
_shared/pool for common ones -
Per-agent mode routing — each agent can independently use
skills_onlyorrlmode - Per-agent LoRA training — each agent gets its own checkpoint, training one doesn't affect another
3. Training Engine Bug Fixes
Found and fixed several bugs in the original pipeline:
- Optimizer tracking frozen base model params → wasted memory
- Logprobs computed after temperature scaling → inconsistent distributions
- Gradient checkpointing + KV cache incompatibility → silent failure
- Thread safety issues (lambda closures, runtime
CUDA_VISIBLE_DEVICESmutation) - Qwen3
<think>tag parsing → reasoning mixed into output - Multimodal content format → OpenClaw sends
list[dict], not plain strings
Why Self-Hosted Training Matters
| Cloud | Self-Hosted | |
|---|---|---|
| Privacy | Conversations sent to 3rd party | Stays on your network |
| Cost | Per-token fees | Free if you have GPUs |
| Flexibility | Fixed models/params | Full control |
| Speed | Queue wait times | Train on demand |
Quick Start
git clone https://github.com/OctoClaws/MetaClaw.git
cd MetaClaw
pip install -e ".[rl,evolve]"
metaclaw setup
metaclaw start
For self-hosted training, deploy the training server on your GPU machine and set rl.backend: remote in config.
Links
- 🔗 Fork: github.com/OctoClaws/MetaClaw
- 🌐 Landing Page: octoclaws.github.io/MetaClaw
- 📦 Original: github.com/aiming-lab/MetaClaw
- 📄 Paper: arxiv.org/abs/2603.17187
Happy to answer questions about the architecture or setup!
Top comments (0)