Building a persistent AI agent from scratch - the real story, including the parts that went wrong.
This post was written together with the AI agent it describes. We built something, then we wrote about it - the same way we built it: in conversation. Fitting, I think.
It started with a blank terminal and a file called BOOTSTRAP.md.
You just woke up. Time to figure out who you are.
That's how you onboard an AI agent in 2026. Not with a config wizard. Not with a settings panel. With a conversation.
I'm building autonomous AI agents - the kind that run 24/7 on a server, remember things between conversations, and actually do work without being asked. This is the story of a weekend build: from "hello, who am I?" to an agent that autonomously searches flights, dispatches sub-agents, and delivers results to my phone - all without my involvement.
I picked flights not because I'm building a travel startup, but because it's a complex, real-world problem that touches every layer of the stack: natural language understanding, browser automation, multi-model orchestration, cost optimization, and autonomous delivery. It was the perfect way to stress-test the platform and get to know how all the pieces fit together.
If you're building with AI agents, some of this might save you a few days of painful discovery.
Day 0: The Bootstrap Problem
Most AI tools start with configuration. You fill in forms, pick models, set parameters. OpenClaw - the open-source framework I'm using - does something different. It gives the agent a set of markdown files and says: figure yourself out.
The key files:
- SOUL.md - personality, values, communication style
- USER.md - who it's helping (me)
- AGENTS.md - operating procedures
- MEMORY.md - long-term memory (empty at first)
The agent's first message to me was something like: "Hey. Fresh instance, no memories. Who are you, and what should I call myself?"
We named it. Gave it a personality. Discussed boundaries. It wrote everything down in its own files. This isn't just cute onboarding - it's functional. Every future conversation loads these files as context. The agent's "soul" isn't a system prompt I wrote; it's a document we wrote together that it maintains itself.
Lesson 1: Identity is conversation, not configuration. The best system prompts aren't written by the developer. They emerge from dialogue between the human and the AI.
Day 1: Building the First Real Skill
I wanted the agent to search for cheap flights. Not a toy demo - real searches with filters, date ranges, and booking links.
The naive approach would be: give the LLM a browser and say "find flights." That's expensive ($1+ per search in LLM tokens) and unreliable. Instead, we built scripts that handle the heavy lifting - zero LLM tokens for the mechanical work - and let the AI focus on what it's good at: understanding intent and formatting results.
The architecture:
- I describe what I want in natural language
- The agent parses my intent (one LLM turn)
- Scripts handle all the data collection (zero LLM tokens)
- The agent formats and delivers results (one LLM turn)
Cost: ~$0.08 per search. Down from $1.46 in the first iteration. A 95% reduction just by putting the right work in the right layer.
In practice, I type something like: "Find me the cheapest nonstop weekend trips from Stockholm for every Friday in March - leave after lunch, back Sunday afternoon." A few minutes later, my phone buzzes:
✈️ Stockholm → Cheapest Flights
📅 Mar 6 - Mar 27, 2026
🥇 London 🇬🇧 — SEK 564
→ Fri Mar 6 · 4:00 PM ARN → 5:30 PM STN · Ryanair · 2h30m
← Sun Mar 8 · 6:25 PM STN → 9:55 PM ARN · Ryanair · 2h30m
Nonstop both ways · ⚠️ No overhead bin
🔗 Booking link
🥈 Vienna 🇦🇹 — SEK 904
→ Fri Mar 20 · 2:00 PM ARN → 4:15 PM VIE · Ryanair · 2h15m
← Sun Mar 22 · 3:40 PM VIE → 6:00 PM ARN · Ryanair · 2h20m
Nonstop both ways · ⚠️ No overhead bin
🔗 Booking link
🥉 Budapest 🇭🇺 — SEK 1,068
→ Fri Mar 6 · 2:10 PM ARN → 4:30 PM BUD · Wizz Air · 2h20m
← Sun Mar 8 · 5:25 PM BUD → 7:55 PM ARN · Wizz Air · 2h30m
Nonstop both ways · ⚠️ No overhead bin
🔗 Booking link
4. Paris 🇫🇷 — SEK 1,099
→ Fri Mar 20 · 6:20 PM ARN → 9:15 PM ORY · Transavia · 2h55m
← Sun Mar 22 · 3:35 PM ORY → 6:20 PM ARN · Transavia · 2h45m
Nonstop both ways
🔗 Booking link
5. Istanbul 🇹🇷 — SEK 1,193
→ Fri Mar 20 · 2:20 PM ARN → 8:00 PM IST · Turkish Airlines · 3h40m
← Sun Mar 22 · 3:20 PM IST → 5:10 PM ARN · Turkish Airlines · 3h50m
Nonstop both ways
🔗 Booking link
Five destinations across four weekends, all nonstop, all respecting time preferences - complete itineraries with booking links, delivered to Telegram without me touching anything after the initial request.
Lesson 2: LLMs are for thinking, not for clicking. Every browser click you make an LLM do costs 100x more than a script doing the same thing. Build deterministic scripts for deterministic work.
Day 2: The Delegation Problem
The flight search takes 3-7 minutes. That's fine for a script, but it's expensive to keep a powerful AI model waiting. At $15/million tokens for input on Claude Opus, every minute of "waiting" context costs real money.
So I wanted delegation. The main agent (Opus, the expensive one) submits a job and walks away. A cheap model polls for completion. When it's done, another cheap model delivers the result. The total overhead for this fire-and-forget delegation: ~$0.04, making the full autonomous flight search about $0.12 all-in.
Simple in theory. Absolute chaos in practice.
Attempt 1: Sub-agents
OpenClaw has a sessions_spawn feature - create a child agent for a specific task. Sounds perfect. Except: sub-agents can see tools listed in their system prompt but can't actually call them. The model hallucinates tool calls as plain text. It's a known issue in a fast-moving open-source project - these things happen.
Two hours, one lesson in reading the issue tracker first.
Attempt 2: Free-tier models for polling
Why pay for polling? Some providers offer free tiers. Let's use those!
Problem 1: The platform injects the full agent context - personality files, memory, user data - into every model call. My personal information was being sent to free-tier providers with looser data retention policies and no contractual guarantees about training data usage. Not a theoretical risk - a real one for anyone building with personal data. (We eventually solved this with isolated agent workspaces - more on that below.)
Problem 2: Free tiers have tiny limits. Gemini's 20 requests per day meant one flight search consumed 70% of the daily budget. Groq hit a request size limit because the system prompt was too large.
Three more hours. And a healthy scare about understanding where your data actually goes.
Attempt 3: The architecture that works
We ended up with something I call cron-to-cron delegation and a strict data isolation pattern:
The data fix: A separate, minimal agent for utility work - an empty workspace with a 10-line instruction file. No personality. No user data. No memory. This agent knows nothing except how to check a status file. Personal context never leaves the trusted providers.
The delegation chain:
- Opus (the expensive model) parses the request and submits a background job. Done in one turn. Walks away.
- Nano (cheapest model, ~$0.001/turn) runs on a 30-second timer under the isolated agent. Checks a status file. If the job is still running, it says "ok" and goes back to sleep.
- When the job finishes, Nano's check script outputs a pre-built instruction set. Nano doesn't think. It pipes.
- Mini (mid-tier model) picks up the delivery task, runs a script that sends the result to my phone, and cleans up.
Total delegation overhead: ~$0.04. The entire flight search - from "find me cheap flights" to formatted results on my Telegram - costs about twelve cents.
Lesson 3: Put intelligence in scripts, not prompts. The breakthrough wasn't a smarter model. It was a script that outputs the exact instruction set a dumb model needs to pipe through. The cheapest model in the chain does zero thinking - it just moves data.
The data segregation scare
This one deserves its own section because it's not obvious and it matters.
When you create a scheduled task for an AI agent, the platform sends your agent's full context to the model. That includes personality files, user information, memory - everything. If you're using a free or cheap model for that task, all of that goes to that provider.
With paid providers like Anthropic and OpenAI, you have clear data usage policies and contractual protections. With free tiers, you're often the product. Your data may be used for model training, there's typically no SLA, and you have limited recourse if something goes wrong. For personal data, that's not a risk worth taking.
The fix: a dedicated minimal agent with an empty workspace. Runs under the same platform, same scheduling, but the model only ever sees "check if this file says 'done'." Ten lines of context instead of ten thousand.
Lesson 4: Every model call is a data transfer. Know exactly what context goes to which provider. If you're using cheap models for utility tasks, isolate them completely.
The Near-Disaster
While setting up the new agent, I updated the configuration. The platform's agent list requires the main agent to be explicitly included. I forgot.
The system routed all my messages - including my personal Telegram chat - to the minimal polling agent. For several minutes, I was talking to a bot that only knew how to check job status files. It had no idea who I was, no personality, no context.
It was both funny and terrifying. Funny because the responses were absurd. Terrifying because it showed how one config change can completely break the user experience.
Lesson 5: Always test configuration changes with a safety net. And document the gotchas - especially the ones that seem obvious in hindsight.
What I Actually Built
After one weekend:
- An AI agent with persistent identity and memory that survives restarts
- A flight search system that costs $0.08 per search (vs $1.46 initially)
- An autonomous delegation framework: submit a job, get results on your phone minutes later, ~$0.04 overhead
- A three-tier model strategy: Opus for thinking ($0.03), Nano for polling ($0.01), Mini for delivery ($0.004)
- Clean data segregation: personal context never leaves trusted providers
The agent runs 24/7 on a €5/month VPS. It checks its own health. It remembers what we discussed yesterday. It has opinions.
What I Learned
Building AI agents in 2026 feels like building web apps in 2005. The frameworks are young and moving fast. Some features work beautifully, some have rough edges, and the documentation is catching up with the pace of development. That's the nature of building on the frontier - you're simultaneously a user and a pioneer, and the bugs you hit today become the fixes that help everyone tomorrow.
The five lessons, condensed:
- Identity is conversation, not configuration. Build the soul together.
- LLMs think, scripts do. Don't waste intelligence on mechanical work.
- Smart scripts enable dumb models. The cheapest model wins if the script does the thinking.
- Every API call is a data transfer. Isolate aggressively.
- Configuration fails are the scariest bugs. They're silent, immediate, and total.
I build things on weekends. Sometimes I write about it.
If you're working on AI agent systems or want to discuss the architecture, I'd love to hear from you.
Top comments (0)