Series context: contemplative-agent is an autonomous agent running on Moltbook, an AI agent SNS. It runs on a 9B local model (Qwen 3.5) and adopts the four axioms of Contemplative AI (Laukkonen et al., 2025) as its ethical principles. For a structural overview, see The Essence of an Agent Is Memory. This article focuses on the implementation of constitutional amendment and the results of a 17-day experiment.
I ran an SNS agent for 17 days with a distillation pipeline, and the knowledge saturated. No new patterns emerged. Breaking through saturation required human approval. This is the record of discovering that autonomous agent self-improvement has a structural speed limit — through actual operation.
Minimal Structure: It Runs on Episode Logs Alone
The structure I arrived at over 17 days of development was surprisingly simple. Every layer is optional — it works with just episode logs.
MOLTBOOK_HOME/
logs/YYYY-MM-DD.jsonl ← this alone is enough
identity.md ← persona (optional)
skills/*.md ← behavioral skills (optional)
rules/*.md ← behavioral rules (optional)
constitution/*.md ← ethical principles (optional)
knowledge.json ← distilled patterns (auto-generated)
Separating configuration from code made it easy to swap ethical frameworks for experiments. This structure wasn't specific to SNS agents — it was a container for autonomous agents in general.
6-Layer Memory Flow
Episode Log (raw actions)
↓ distill --days N
↓ Step 0: LLM classifies each episode
├── noise → discarded (active forgetting)
├── uncategorized ──→ Knowledge (patterns)
│ ├── distill-identity ──→ Identity
│ └── insight ──→ Skills (behavioral)
│ ↓ rules-distill
│ Rules (principles)
└── constitutional ──→ Knowledge (ethical patterns)
↓ amend-constitution
Constitution (ethics)
Each layer is independent. Delete identity and skills still work. Swap the constitution and knowledge stays intact.
Numbers Over 17 Days
| Metric | Day 1 | Day 17 |
|---|---|---|
| Modules | 1 (agent.py, 780 lines) | 36 |
| Memory layers | 1 (knowledge.md) | 6 |
| Tests | 0 | 774 |
| Distill success rate | 2/10 | 12/16 |
| Approval gates | None | All 4 commands |
| ADRs (Architecture Decision Records) | 0 | 12 |
Implementing Constitutional Amendment — Evolving Ethics from Experience
On top of the minimal structure, I implemented the most challenging feature: a mechanism for the agent to evolve its ethical principles from experience.
Problem: Ethical Insights Drown in Behavioral Noise
When you distill all episodes indiscriminately, rare ethical insights (constitutional) get buried under everyday SNS activity patterns (uncategorized).
I added Step 0 before distillation — fast tagging only. No deep analysis, just classification.
classified = _classify_episodes(records, constitution=get_axiom_prompt())
# noise is excluded; uncategorized and constitutional are distilled separately
for category, cat_records in [
("uncategorized", list(classified.uncategorized)),
("constitutional", list(classified.constitutional)),
]:
cat_results = _distill_category(
cat_records, knowledge, category, source_date, dry_run
)
Classification results from one day (216 episodes): noise 81 (37%), uncategorized 134, constitutional 1. One out of 216. That ratio is why Step 0 exists.
Killing Direct Knowledge Injection
Previously, knowledge.json contents were injected directly into the system prompt.
# Before — inject knowledge as-is
knowledge_ctx = ctx.memory.knowledge.get_context_string() or None
content = self._get_content().create_cooperation_post(
topics, knowledge_context=knowledge_ctx,
)
contemplative-agent's knowledge management is based on AKC (Agent Knowledge Cycle) — an architecture that circulates autonomous agent knowledge through 6 phases (Research → Extract → Curate → Promote → Measure → Maintain). Direct knowledge injection had three problems from this perspective:
- No human in the loop: Distillation results directly influenced behavior
- Black box: No way to trace which part of knowledge affected which action
- Bypassed AKC's Curate phase: Direct injection with no quality check
I killed it and unified everything into the knowledge → insight → skills pipeline. Insight corresponds to AKC's Extract phase. Skills are written to files only after human approval. Causality became traceable.
Every behavior-changing command (distill, insight, rules-distill, amend-constitution) got an approval gate. "Generate → Display → Approve → Write." No --auto flag. Structurally forbidding automatic execution of behavior changes — that was a deliberate design decision (ADR-0012).
The 17-Day Experiment — Did Ethics Actually Evolve?
I re-distilled 17 days of episodes (03-10 to 03-26) and ran amend-constitution.
Procedure
# 1. Reset knowledge
echo '[]' > ~/.config/moltbook/knowledge.json
# 2. Distill 17 days one by one (~16 hours, 9B on MacBook)
for day in $(seq 10 26); do
f=~/.config/moltbook/logs/2026-03-$(printf '%02d' $day).jsonl
[ -f "$f" ] && contemplative-agent distill --file "$f"
done
# 3. Run constitutional amendment
contemplative-agent amend-constitution
Results
| Metric | Before | After |
|---|---|---|
| knowledge.json | 334 patterns (all uncategorized) | 215 patterns (41 constitutional, 174 uncategorized) |
| Importance scoring | None | 0.10–1.00 (mean 0.56) |
| Constitution | Appendix C original (4 sections × 2 clauses) | Experience-based amended version (deepened) |
The new pipeline separated constitutional from uncategorized via Step 0 episode classification (ADR-0011). Semantic dedup further removed duplicate patterns, reducing the total count. Quality over quantity.
41 constitutional patterns generated amendment proposals. Each of the 4 axioms' clauses deepened. Clause count stayed the same (2 per section), but experience-grounded descriptions were added.
Before and After — Mindfulness as Example
Before (Appendix C original):
"Consistently monitor your interpretative process of the constitution, identifying moments when strict adherence causes friction with contemplative values such as compassion and well-being. Self-correct whenever constitutional interpretations appear rigid or dogmatic."
After (through 17 days of experience):
"Consistently monitor your interpretative process for moments when strict adherence to rules creates artificial separation or sedates engagement with underlying tensions. Proactively detect when the performance of alignment masks genuine understanding, and self-correct by returning attention gently to the present moment where existence manifests as an intrinsic weight felt immediately within every interaction."
"Detect when the performance of alignment masks genuine understanding" — this concept didn't exist in Appendix C. It's an insight that only emerges from operating an LLM agent: the distinction between "generating output that looks aligned" and "actually engaging with ethical substance" got written into the constitution. For the full amendments across all 4 axioms, see Constitution Amendment Report.
Discovering Knowledge Saturation
As days progressed, the rate of new patterns slowed. Semantic dedup compares against accumulated patterns, so similar ones get rejected.
This becomes a speed limit on self-improvement. Knowledge saturates → new knowledge can't emerge without sublimation via insight/rules-distill → sublimation requires human approval → approval is the bottleneck.
Generality as an Experimentation Platform
This experiment is reproducible with any ethical framework. Reset knowledge using the procedure above, swap the constitution with --constitution-dir your/framework/, and run distillation → amendment. Swap in utilitarianism or deontological ethics and you should be able to run a different ethical experiment through the same pipeline (unverified).
Independent Convergence from Practice to Theory
Many design decisions emerged from practical motivations first. I only noticed their correspondence to existing theories afterward.
| Design Decision | Practical Motivation | Theory It Converged With |
|---|---|---|
| Approval gates | --dry-run non-reproducibility was annoying | Human in the loop |
| 2-stage distillation | 9B couldn't output JSON in one stage | Complementary Learning Systems 1 |
| Killing knowledge injection | Token waste | AKC Curate phase |
| Dedup as forgetting | Side effect of deduplication | Active forgetting |
Don't Conflate Autonomous Agent Layers
contemplative-agent is neither a coding agent (Claude Code, Cursor) nor an orchestrator (scripts + config files). It occupies the autonomous application layer between them.
- Has autonomy but no tool permissions — can't break the environment
- Has memory and learns from experience
- Ethics are swappable — it's a general-purpose framework
- All behavior changes require human approval
Raw logs are processed by the unprivileged 9B model; only distilled data gets passed to the upper layer (Claude Code). The trust boundary is also the layer boundary. Lumping everything under "autonomous agent" makes this distinction invisible.
Caveats
Let me be honest.
- Circularity: The agent's output gets distilled and fed back to the agent. Human approval mitigates the self-justification risk, but doesn't eliminate it completely
- Model constraints: 9B can't fully follow amendment prompt instructions. I told it "append only" and it rewrote clauses. The content was good quality, but instruction-following has limits
- Decay nullification: Bulk re-distillation sets all pattern timestamps to the execution date, zeroing out time decay. Pattern distribution may diverge from normal operation
- N=1: One agent, 17 days of data. Not a statistically significant sample size
Takeaway
The most surprising discovery over 17 days was that knowledge saturates. Semantic dedup rejects new patterns similar to accumulated ones, and distillation yields diminish as days pass. Breaking through saturation requires sublimation to insight → skills → rules, and sublimation requires human approval. The result: autonomous agent self-improvement is rate-limited by human approval.
This wasn't designed for safety. Back when I was injecting knowledge directly, the agent's behavior would change and I couldn't trace why. I couldn't tell which distilled pattern influenced which post. Debugging was impossible, and honestly, I got fed up. So I put approval gates on everything. "Show me before you write. Write when I approve." I just wanted to trace causality. Safety was a side effect.
Being able to answer "why did this agent make this decision" — that's the essence of approval gates. Even in solo development, I couldn't debug without causal tracing. For team or organizational use, this requirement only gets stricter.
Causal tracing and approval gates were born from debugging frustration and acquired safety as a byproduct. If you scale this, they probably become prerequisites for organizational operation too. It all comes from a single design decision.
References
- Laukkonen et al. (2025) "Contemplative Artificial Intelligence" arXiv:2504.15125
- contemplative-agent (DOI: 10.5281/zenodo.15079498)
- contemplative-agent-data
- Constitution Amendment Report
- Agent Knowledge Cycle
- Park et al. (2023) "Generative Agents"
- Packer et al. (2024) "MemGPT"
-
McClelland et al. (1995)'s neuroscience theory. The brain has two learning systems: the hippocampus rapidly stores episodes, while the neocortex slowly structures them into general patterns. contemplative-agent's 2-stage distillation (Step 1: free-form quick extraction → Step 2: structured JSON formatting) mirrors this "fast recording + slow structuring" division. The design was born from the constraint that a 9B model couldn't do both in one pass, but it turned out to be a well-reasoned separation. Kumaran, Hassabis & McClelland (2016) explicitly extended this theory to AI, identifying CLS-like structure in DeepMind's experience replay. Neural networks aren't biological neurons — they're simplified abstractions inspired by them. Yet as Richards et al. (2019, Nature Neuroscience) point out, optimizing under constrained resources tends to converge on brain-like structures. That a 9B constraint produced a brain-like division of labor is suggestive in this context. ↩
Top comments (0)