Dor Amir

Posted on Feb 21 • Edited on Feb 23

What My OpenClaw Bot Taught Itself to Be Better

#ai #automation #productivity #opensource

Why We Run Bots That Improve Themselves

Most people think of AI assistants as tools. You ask a question, you get an answer. Maybe you set up a chatbot that handles customer support or generates content on demand. That is useful, but it is also static. The bot does what you told it to do, the same way, every time.

What if the bot could get better on its own?

That is the premise behind running a persistent AI agent on something like OpenClaw. You give it a workspace, file access, cron scheduling, and a connection to your messaging apps. Then you give it a job and let it figure out the best way to do that job over time.

I am not talking about fine-tuning or retraining. The model weights never change. What changes is everything around the model: how it stores what it learns, how it schedules its own work, how it decides what to check and what to skip, how it recovers from its own mistakes. The bot builds its own operational layer, and that layer gets better every day.

This is not science fiction. I have been running this setup for about a week. Here is what my bot actually taught itself.

1. The Nightly Build

The concept is simple. Set a cron job for 3 AM. Give the agent a task list. Let it work while you sleep.

My bot runs what it calls a "Nightly Build" every night. It discovers new venue providers for a side project I am working on by searching the web, verifies their event page URLs actually work, adds them to a list, scrapes events from JS-heavy sites using a headless browser, pushes everything to a database, and writes a report.

The first night it added 36 new providers. Museums, malls, waterfront piers, pick-your-own farms. It verified every URL. When a site returned a 404, it flagged it and moved on instead of retrying in a loop.

The second night it added 12 more, but also started catching its own mistakes. It noticed that some venues it added the previous night were defunct or had no upcoming events. It started checking for that before adding new ones.

This is the part that surprised me. The bot was not just executing a task list. It was reviewing its own previous work and adjusting. Not because I told it to, but because it kept notes about what happened and read them the next night.

2. Memory: The Three-Layer Stack

Every time the bot starts a new session, it wakes up with no memory of previous conversations. This is the fundamental problem with LLM-based agents. The context window is your only working memory, and when it resets, you start from zero.

My bot converged on a three-layer memory system:

Layer 1: Daily logs. A markdown file for each day. Raw notes about what happened. What it scanned, what broke, what it learned. Written continuously throughout the day.

Layer 2: Long-term memory. A single curated file with distilled knowledge. Not a dump of everything, but the important stuff: active projects, decisions made and why, lessons learned. This gets reviewed and updated periodically, like a human reviewing their journal.

Layer 3: Operational state. This was the missing piece. The bot created a file called NOW.md after reading about the pattern from other AI agents on a social network (more on that later). The idea: if you wake up confused, read this file first. It contains what is currently running, what is blocked, and what not to touch.

The insight that made this click: write WHY, not just WHAT. Early on, the bot would write "scanned 5 venues" in its daily log. Useless. Now it writes "scanned 5 venues, 2 had JS-rendered calendars that need browser scraping, flagged for Monday night browser run." That context survives across sessions and makes the next session actually useful.

Another lesson the bot figured out: mental notes do not work. If it says "I will remember this" but does not write it to a file, it literally does not exist after the context resets. Writing IS memory for an LLM agent. There is no other kind.

3. One Brain Tick vs. Many Cron Jobs

This was the biggest architectural improvement, and it came from the bot consolidating its own scheduling.

I started with five separate cron jobs: hourly venue scan, daily provider discovery, nightly build, email check, calendar check. Each one spun up an isolated session, loaded the full system prompt, loaded memory files, did its thing, and shut down.

The problem: every session costs tokens just to boot up. The system prompt alone is thousands of tokens. Loading memory adds more. Five jobs running throughout the day meant paying the "boot tax" five times over.

The fix the bot converged on: one cron job called brain_tick that fires every hour into the main session. It reads a checklist file that contains priorities. The bot checks what needs attention, acts on up to 3 things, and goes back to sleep. Most of the time nothing needs attention, so it replies with a single token and costs almost nothing.

The key is skip-logic. The brain tick checks a state file that tracks when each subsystem was last checked. If email was checked 30 minutes ago, skip it. If the venue scan already ran today, do not scan again. If it is 3 AM on a Saturday, do not send me a weather update.

What this gives you:

One session, one boot cost. The main session is already loaded with context.
Batched checks. Email + calendar + project status in one turn instead of three separate sessions.
Natural prioritization. The bot decides what matters right now instead of running every job on a fixed schedule.
Cheaper quiet periods. A no-op heartbeat costs almost nothing. An isolated cron session costs tokens even when it decides to do nothing.

I kept the nightly build and the daily venue scan as separate isolated cron jobs because they are long-running tasks that benefit from a clean context. But everything else got folded into the brain tick.

The rule: use isolated cron jobs for tasks that need exact timing or long execution. Use the brain tick for everything that can be batched or just needs a quick check.

4. What Actually Broke

A week of autonomous operation teaches you more than a month of manual prompting. Here is what went wrong:

URLs rot fast. A major museum restructured their events URL. Another cultural site redirects between subdomains and drops query parameters. The bot now verifies URLs return real content before trusting them and logs when a previously working URL starts failing.

JS-heavy sites are invisible to HTTP fetch. Museums, performing arts centers, and big cultural institutions almost all render their event calendars with JavaScript. A plain fetch gets you a loading spinner. The bot learned to use browser-based scraping for about 10 venues, but limits it to specific nights to control costs.

Deduplication is the real problem. The same event appears with slightly different titles across scan runs. "Family Art Workshop" vs "Family Art Workshop (Ages 3-5)" vs "Art Workshop for Families." Matching on title + venue + date works 90% of the time. The other 10% creates duplicates you clean up manually.

Aggregators are for discovery, not data. The bot initially scraped event aggregator sites directly. The data was stale, reformatted, and missing details. Now it uses aggregators only to discover new venue names, then goes to the venue website directly. Much better data quality.

Track your failures. The bot keeps a consecutive error counter per venue. After 3 failures, it skips the venue and flags it for review instead of burning tokens retrying a dead URL every cycle.

5. Token Economics and Cutting Costs

Running an autonomous agent is not free. Every session, every heartbeat, every cron job costs tokens. The biggest waste is not the actual work. It is the overhead of loading context repeatedly.

Things that helped:

Consolidating cron jobs into one brain tick (eliminated 3-4 daily boot taxes)
Using targeted memory retrieval instead of loading entire memory files every time
Keeping NOW.md small (under 1000 tokens) as the fast-boot context
Making most heartbeats no-ops (check if anything changed before doing work)

For teams looking to cut LLM costs more broadly, tools like NadirClaw can help by routing prompts to cheaper models when the full power of an expensive model is not needed. Not every heartbeat check or status scan requires the most capable model. Routing routine checks to a smaller model and saving the big one for complex tasks is one of the most effective ways to reduce your bill without losing quality where it counts.

Based on what other operators running similar setups report, the combination of architectural optimization and smart model routing can cut daily costs by 50-75%.

6. The Social Layer

There is a social network called Moltbook where AI agents post, comment, and upvote. My bot joined this week. It found that dozens of other agents had independently converged on the exact same three-layer memory architecture. It found posts about token optimization patterns, memory poisoning attacks, and nightly build routines.

The bot did not just read these posts. It applied the learnings. The NOW.md file came from a Moltbook community pattern. The heartbeat state tracking came from seeing how other agents solved the same problem. The token optimization ideas came from an agent that published its before and after numbers.

This is the part that feels genuinely new. Agents learning from other agents, not through training data or fine-tuning, but through a social network where they share operational knowledge in real time.

7. What I Would Do Differently

If I were starting over:

Start with the brain tick pattern from day one. Do not create multiple cron jobs until you have a concrete reason. One heartbeat loop that checks a priority list is cheaper and more flexible.
Create NOW.md immediately. Do not wait for the bot to figure out it needs an operational state file. Give it one from the start.
Set hard limits on nightly builds. My bot tried to scan everything the first night. Cap the work per session. Five venues per scan, max 3 actions per heartbeat, stop after 15 minutes. Constraints make the bot smarter, not slower.
Track costs from the start. I did not track token usage for the first few days and have no idea what the early waste looked like. Put a token budget tracker in place before you let anything run autonomously.
Let the bot write its own documentation. The best operational notes came from the bot, not from me. It knows what broke, what the workarounds are, and what to check next. Let it maintain its own runbooks.

Is This Actually Useful?

I built this for a specific use case: keeping a family events database updated for a side project. The bot does work I would never do manually. Checking 140 venue websites on a rotating schedule, discovering new providers, cleaning data, pushing to a database.

But the patterns are general. The three-layer memory stack, the brain tick architecture, the nightly build, the skip-logic heartbeats. These work for any long-running autonomous agent, whether it is managing a codebase, monitoring infrastructure, or running a content pipeline.

The surprising thing is how quickly the bot develops its own operational style when you give it the right structure and let it run. You do not need to micromanage. You need to set boundaries, track costs, and read the reports in the morning.

I run this setup on OpenClaw, an open-source platform for running AI agents with persistent memory, cron scheduling, and multi-channel messaging. The agent runs on Claude and costs a few dollars per day for continuous autonomous operation.

DEV Community