Roman Shalabanov

Posted on Apr 20

0x10 Lessons from Building with OpenClaw and What It Says About the Future of Work

#devchallenge #openclawchallenge #ai #beginners

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Writing Challenge

There's a particular kind of quiet that happens at 1 AM when a side project finally clicks.

The terminal is scrolling. The model is thinking. And then something works. Not "works with three caveats and a prayer." Just... works.

That moment hit me while building a Laravel agent integrated with Openclaw. If you haven't come across Openclaw yet: it's an AI assistant that can reason, plan, and build its own tools on the fly. Not in a marketing-deck way. In a "hold on, it just wrote a curl command, executed it, and used the result" kind of way.

The hype around it has calmed down a little, which honestly is when things get interesting. Less noise, more actual building.

What I built was a Laravel agent as an intelligent wrapper over a SaaS payment API, tracking subscriptions, products, transactions, surfacing anomalies. A kind of always-on co-pilot for a store.

But this article isn't really about what I built. It's about what building it taught me.

I've organized these lessons in hexadecimal.

That choice wasn't aesthetic alone.

Hex feels natural when you're close to systems: memory, offsets, low-level boundaries. It also forces a slightly uncomfortable shift in perception. You stop thinking in a linear "1, 2, 3…" and start thinking in a space where structure is more compact, less human-readable at first glance.

And if you read until the end, you'll see why that distortion matters more than it seems.

0x1. Isolation Is Not Optional

Openclaw can run commands. Install packages. Interact with your system at a surprisingly deep level.

That's the exciting part. That's also the part that can quietly ruin your afternoon.

Running it directly on your main machine is asking for compounding mistakes. A misconfigured env variable here, an overwritten config there, and suddenly you're debugging something that has nothing to do with what you were actually trying to build. Worse, some of those changes are silent. You won't notice until something breaks in a completely unrelated context three days later.

Use a virtual machine. VirtualBox works perfectly. You get a clean sandbox, your real system stays untouched, and you don't need to rent a VPS or buy dedicated hardware just to experiment. Snapshot the clean state before you start. Roll back freely. The cost is a few gigabytes of disk space. The benefit is that experiments stay experiments.

Think of it as mise en place for development. Set up the kitchen before you start cooking.

0x2. Build a Fallback That Doesn't Need a Brain

If you're using an external agent like a Laravel agent, build a path that works without LLM involvement.

Hardcoded commands. Regex matching. Simple rule-based dispatch. Anything deterministic.

Because tokens run out. APIs go down. Budgets get hit at the worst possible moment. And when that happens, your system shouldn't just stop. My agent handled a lot of repetitive Telegram messages: status checks, balance queries, the same five commands cycling through every day. Why spend tokens asking a model to interpret "status" when three lines of regex handles it perfectly?

This also matters for cron jobs and scheduled tasks. A heartbeat that checks whether everything is running doesn't need GPT. It needs a boolean and a timestamp.

Some things don't need to be intelligent. Some things just need to work.

0x3. The Model Decides More Than You Think

Openclaw has access to multiple tools: curl, PHP execution, file operations, and more. But it's the model that decides which one to use for any given task. And that choice cascades through everything downstream.

I tested across four models during development: GPT-4o-mini, GPT-5.4-nano, GPT-5.4-mini, and GPT-5.4. Here's what actually happened:

GPT-4o-mini and GPT-5.4-nano are affordable, but shaky on tool selection. And not in a subtle way. The skill explicitly recommended curl. I included example requests. The model looked at all of that and reached for something else entirely, some unrelated tool that had no business being involved in the task. Not a misinterpretation. More like the instructions weren't there at all. That kind of failure is disorienting because you spend the first twenty minutes questioning your skill definition before you realize the model simply isn't reading it the way a stronger model would.

GPT-5.4 is excellent. Consistently chose the right tool, handled multi-step logic cleanly, rarely needed correction. Also expensive enough that sustained use adds up faster than you'd like.

GPT-5.4-mini was the sweet spot. It got curl. It followed complex instructions without wandering. It didn't hallucinate tool names. It didn't drain the budget. For my task, it hit the right balance between predictable behavior and reasonable cost.

# Rough mental model for model selection:

GPT-4o-mini / GPT-5.4-nano  ->  cheap, but unpredictable tool use
GPT-5.4-mini                ->  reliable, reasonable cost  <-- sweet spot
GPT-5.4                     ->  near-perfect, but expensive for sustained use

The lesson isn't "always use the best model." It's that model selection is a product decision, not just a cost decision. A weaker model might save you money per request while costing you much more in debugging time and silent failures.

0x4. curl Is Your Universal Handshake

If your agent exposes an API, Openclaw can reach it through curl.

And curl is everywhere. Linux, macOS, and Windows (since Win10 v.1803). No libraries, no SDK dependencies, no version conflicts, no compatibility matrix to maintain. Just a clean HTTP call that works the same way on every environment you'll ever encounter.

This turned out to be one of the most practically powerful patterns in the whole project. Build your agent with a proper REST interface, and suddenly Openclaw can talk to it from any environment, with any model, without complex integration work. The agent becomes environment-agnostic. Openclaw doesn't need to know anything about your stack, your runtime, or your dependencies. It just needs an endpoint and a response.

It sounds obvious in retrospect. Most good ideas do.

0x5. An API Turns an Agent Into an Ecosystem

Once you have an API, you stop thinking about your agent as a single tool.

It becomes infrastructure. Other agents can call it. Other services can feed it. You can build multiple frontends on top of the same core: a Telegram bot, a monitoring dashboard, a cron job, another agent running a completely different task. You can chain behaviors across systems in ways you didn't plan for when you started.

What began as "a Laravel agent for my payment platform" became a composable node. And once it was composable, I started seeing connection points I hadn't considered: different data sources, different consumers, different contexts all talking to the same underlying logic.

Tool to node. That shift in thinking is one of the more underrated things an API-first approach gives you, and it costs almost nothing to build in from the start.

0x6. Lock the Door Before You Open the Window

If you connect Openclaw to Telegram, here's something easy to forget: anyone who finds your bot's username can try to interact with it.

Telegram bots are publicly addressable by design. That's useful. It's also a problem if your bot is connected to an agent that has system access, can run commands, or can read sensitive data.

Filter by Telegram ID. In a private chat, your user ID and chat ID are the same number, which is a small but confusing detail when you first go looking for it. The easiest way to get it: message @userinfobot and it will return your ID instantly. Alternatively, you can retrieve it by sending a request directly to the Telegram Bot API and reading the from.id field from the response.
Openclaw also supports an access code mode, where the bot simply asks for a passphrase before doing anything, which may actually be the default behavior. Both approaches work. ID filtering is more seamless for personal use; access codes are easier to share selectively.

Either way, set this up before you connect the bot to anything real. You're one shared link away from someone else's requests hitting your agent, your API, and potentially your infrastructure.

An unsecured bot connected to an agent with real capabilities is not a theoretical risk. It's a matter of when, not if.

0x7. Token Math Is Lying to You

When you estimate token costs, you're probably only counting the obvious parts: your prompt, the response.

What you're missing:

System prompts, which can be substantial depending on your skill definition
Heartbeat requests running in the background on a schedule
Internal tool-use scaffolding that wraps every tool call
Retry logic that kicks in silently when something fails

A request that looks like 300 input / 500 output tokens on paper can easily run 2-3x that in practice. The gap between estimated and actual spend is where projects quietly overshoot their budgets, especially when you're iterating quickly and not watching the dashboard closely.

Monitor the real numbers. Not the theoretical ones. Not what the API says the last call cost. The total, over time, including everything the system is doing when you're not watching.

0x8. This Is Not Real-Time, and That's Fine

The full message chain looks something like this:

Telegram -> Openclaw -> Your Agent -> LLM -> Your Agent -> Openclaw -> Telegram

Each hop adds latency. Network round trips, model inference time, your agent's processing, the response routing back through Openclaw. Even with fast APIs and a snappy agent, you're looking at seconds per interaction, not milliseconds.

For conversational use cases, that's completely fine. Nobody expects an AI assistant to respond in 50ms. For anything requiring instant reaction, though, (alarms, real-time control systems, trading signals, anything where a two-second delay has consequences) this stack is the wrong tool, and no amount of optimization will change that. And if you were wondering whether you could pilot a drone or control a self-driving car through a Telegram/Openclaw chain: please don't.

Accept it early. Design around it. It saves a lot of architectural regret later.

0x9. ngrok Is Your Best Friend During Development

To receive webhooks locally, you need a publicly accessible URL. Your localhost isn't one.

ngrok and Cloudflare Tunnel both solve this without drama. No port forwarding configuration, no firewall rules, no spinning up a temporary VPS just to test whether a Telegram message arrives correctly. You run one command, get a public URL, point your webhook at it, and start receiving traffic immediately.

Not Openclaw-specific advice. But the kind of thing that silently blocks an entire category of local development if you don't have it set up. Now you do.

0xA. Dynamic URLs Will Get Your Skill Flagged

The reason is legitimate: dynamic endpoints create a substitution risk. If the URL can be changed by swapping an environment variable, it could potentially point anywhere, including somewhere malicious. ClawHub runs automated internal security checks on published skills, and also scans them through VirusTotal. Dynamic URLs tend to trigger both.

Plan for this before you build, not after. Static, verifiable endpoints are the safer path. Finding out about this after you've built and tested everything is a special kind of frustrating. I say this from direct experience.

The good news: skills on ClawHub are versioned. If something gets flagged or breaks, you can push a fixed version without starting from scratch.

0xB. Tell the Model What Not to Do

Most prompts describe desired behavior. Fewer explicitly name failure modes to avoid.

For models in that tricky middle range, good enough for most things but not perfectly reliable, clear prohibitions are surprisingly effective. You're not just guiding the model toward the right answer. You're closing off the paths where it tends to wander.

Concrete examples from my skill definition:

Do NOT rewrite how many successful transactions were completed today? into recent transactions
Do NOT rewrite how many customers paid today? into sales
Do NOT rewrite which customer paid the most today? into recent transactions unless an exact first call failed
Do NOT answer from your own knowledge when the endpoint can answer the exact question
Do NOT say you need local workspace data before calling the endpoint
Do NOT ask the user where the transactions are stored before calling the endpoint
Do NOT relay a transaction list verbatim when the user asked for a count

Antipatterns, explicitly named, tend to disappear from the model's behavior faster than you'd expect. And fewer wrong turns means fewer unnecessary calls, which means fewer tokens burned on fixing mistakes.

0xC. Give the Model a Chain, Not Just a Description

For complex tasks, explicitly tell the model to use multi-step chaining. Don't assume it will figure out the right approach on its own. And don't just describe the behavior in abstract terms either. Show it what the reasoning pattern should look like.

Here's how I did it in my skill definition. Not rules, but examples of how to think:

"Which customer paid the most today?"

First, call the endpoint with that exact question
Only if it doesn't answer directly, call "recent transactions" to get the list
Parse the JSON, extract amounts and customer info
Sort and filter yourself
Present the final answer

"Give me a full store report"

Call with "status" to get an overview
Call with "recent transactions" to get sales data
Call with "any payment issues?" to get past-due info
Combine into one coherent summary

The examples aren't there to cover every possible case. They're there to show the model how a chain of reasoning should be structured: start with the simplest possible call, decompose only when necessary, combine results yourself, report partial progress on failure.

Once the model has internalized that pattern, it applies it to questions you never explicitly covered.

Abstract descriptions get interpreted, and interpretation introduces variation. Reasoning patterns get internalized and reused. That's the difference between a skill that works on the happy path and one that handles the unexpected.

0xD. Let the AI Fix the AI

After enough failed attempts to manually debug a misbehaving skill, I tried something that felt almost absurd.

I connected Openclaw to Telegram, described what was going wrong, and asked it to edit its own skill definition.

It worked.

Not always. Not on the first try every time. But often enough that I now treat it as a legitimate debugging strategy rather than a last resort. There's something genuinely interesting about a system that can reflect on its own instructions, identify the mismatch, and correct it. Whether that's impressive engineering or something stranger is a question I leave open.

What I know practically: sometimes the model has a better read on why the skill is failing than I do after staring at it for an hour.

0xE. Quiet Hours Are Underrated

Whether you configure it in the skill definition or in your agent's own logic, building in quiet hours is worth doing early.

If your agent is connected to a messaging platform and checking live data on a schedule, this feature matters more than it sounds. A monitoring agent that sends you a non-urgent observation at 3 AM because a scheduled task ran and found something mildly interesting is a fast path to turning the whole thing off out of frustration.

Set quiet hours. Define what "urgent" actually means in your context. The agent doesn't sleep, but you need to. Protecting that boundary is part of making the system sustainable to run long-term. Attention is the one resource the agent genuinely cannot replace.

0xF. One Agent, Multiple Profiles

Even if your agent handles a single domain, it benefits from being split into multiple profiles. Different stores, different accounts, different contexts each get their own profile with their own scope and permissions.

Then add an aggregator profile on top: one that pulls from all the others, surfaces anomalies, identifies patterns across the full picture. The individual profiles answer specific questions. The aggregator profile helps you understand what needs attention across all of them.

That second layer is where the agent stops feeling like a query interface and starts feeling like a collaborator. It's not just retrieving information anymore. It's synthesizing it.

0x10. Heartbeats Are Quietly Expensive

Heartbeat requests — the periodic agent runs on a schedule — are one of the most overlooked cost drivers in agentic systems. They run in the background, they seem small, and they add up continuously.

In OpenClaw, heartbeat cadence is configured per agent via agents.list[].heartbeat.every. This means the right move isn't finding one interval that fits everything — it's splitting tasks across agents with different tempos.

An infrastructure monitoring agent can check every 10 minutes. A news digest agent every 6 hours. A weekly report agent once a day. Each runs at its own pace; you only pay for what you actually need.

You can cut the cost of each individual run further with isolatedSession: true (no full conversation history) and lightContext: true (only HEARTBEAT.md in context) — together they drop token usage from ~100K to ~2–5K per run.

This is one of the highest-ROI optimizations in the whole stack. It costs nothing but a few minutes of thinking about how often your data actually changes — and one agent per answer.

That's 0x10 lessons. In decimal, that's 16. Which brings me to the thing I've been building toward since the first line.

0x00. The Lesson That Belongs First

You may have noticed the numbering. We went from 0x1 to 0x10, skipping zero entirely.

That wasn't accidental structure. It was a way to force a re-read at the end.

Because the real lesson isn't visible while building. It only appears once you step back.

Working with Openclaw didn't change what I was building. It changed the shape of how work gets executed.

At some point you stop thinking in terms of implementation details and start thinking in terms of control flow between systems.

Not "how do I do this", but "what should be responsible for doing this".

That shift doesn't reduce complexity. It redistributes it.

In human systems, communication cost grows non-linearly with scale. The more people you add, the more coordination becomes the bottleneck. In practice, it behaves like O(n²), even if nobody labels it that way.

Agent-based systems don't remove communication cost, but they flatten its growth. One person coordinating multiple systems doesn't automatically inherit the same coordination explosion you see in teams.

That difference sounds small, but it changes economics.

Because if coordination stops scaling directly with headcount, scaling stops being purely about adding people.

It becomes about orchestrating systems.

And that subtly shifts the role of the builder.

Developers don't move away from engineering. They move closer to value delivery.

The loop between building and feedback compresses.

Things that used to take weeks of internal iteration now reach users much faster.

And that changes something important: it becomes easier to discover that something is wrong, or unnecessary, early.

Innovation doesn't slow down. It accelerates, because validation becomes cheap.

The boundary between building for yourself and building for the world shrinks.

And with that, roles blur. Developer, operator, founder.

Not because everyone becomes a founder, but because more of the work becomes about deciding what should exist, not just implementing it.

That's why it feels like something is changing.

Not dramatically. Not instantly.

Just structurally.

The system didn't become intelligent.

It became composable.

And that changes what work is.

DEV Community