Vicente Junior

Posted on May 23

I Built a WhatsApp Finance Agent in OpenClaw. Migrating to Hermes Taught Me What "Self-Improving" Actually Means.

#hermesagentchallenge #devchallenge #agents

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

TL;DR

I built Finn — a personal finance agent for WhatsApp — on OpenClaw, in TypeScript, with a single agent and six tools.

Migrating to Hermes Agent isn't a framework swap. It's a paradigm shift from "an agent that executes" to "an agent that learns."

The hermes claw migrate command makes the OpenClaw → Hermes path official. The gains are real (persistent memory, skill-based abstractions, multi-platform gateway), and so are the tradeoffs.

Meet Finn

A few months ago I built Finn, a personal finance assistant that lives entirely inside WhatsApp. No app to install, no dashboard to remember — just a chat:

You:  spent 45 on lunch
Finn: $45 · Food · Mastercard · Me · 2026-04-22 — confirm? ✅
You:  sim
Finn: ✅ Saved!

Under the hood, Finn is an OpenClaw plugin written in TypeScript. It runs a single agent with a gpt-4.1 tool-use loop (capped at 5 iterations), persists transactions to Supabase, parses credit card PDFs (text-based and OCR via gpt-4o), and transcribes voice notes through Whisper. It speaks both Portuguese and English depending on the last message it received.

I shipped it for the OpenClaw Challenge 2026 — you can see the full demo, code samples, and architecture walkthrough here. It works. I use it every day.

And then I read the Hermes Agent docs.

This post is about what happens when you take a working production agent built on OpenClaw and ask: what would this look like in Hermes, and is the upgrade worth it?

What Finn Already Does Well in OpenClaw

Before I criticize anything, let me be fair to the framework I picked.

OpenClaw gave me a clean abstraction that mapped naturally to "personal agent on WhatsApp":

A before_dispatch hook that intercepts incoming messages and routes them through my plugin.
A simple tool-use loop — define a JSON schema, expose handler functions, let the model call them. Max 5 iterations per turn keeps cost and latency predictable.
Native WhatsApp gateway through OpenClaw's connector model.
Phone whitelist at the gateway level, before my code even sees the message.

The six tools I exposed to the agent map directly to the things a finance assistant needs to do:

Tool	Purpose
`save_transaction`	Persist a confirmed expense, income, or card payment
`query_spending`	Query totals and breakdowns from the database
`save_bulk_transactions`	Bulk-save invoice items from a PDF import
`save_bank_statement`	Bulk-save bank statement rows
`update_transaction`	Edit a saved record (with confirmation)
`delete_transaction`	Delete a saved record (with confirmation)

Every mutation requires explicit user confirmation before the tool is called. The whole architecture fits on one page:

WhatsApp
    │
    ▼
OpenClaw Gateway  ──(before_dispatch hook)──▶  Finn Plugin (TypeScript)
                                                  │
                            ┌─────────────────────┤
                            ▼                     ▼
                       Text / Audio              PDF
                            │                     │
                       runAgent()           Custom parsers
                       (gpt-4.1 loop)      (text + OCR fallback)
                            │                     │
                            └──────────┬──────────┘
                                       ▼
                                  Supabase

It's a clean design. I'm proud of it. But the longer I used Finn, the longer my wishlist got.

The Walls I Hit

After a few weeks of daily use, three things started bothering me:

1. Finn forgets everything between conversations.
When I say "roxinho" I mean my Nu Bank card. When I say "feira" I mean grocery store. Today, those mappings live in the system prompt. They don't grow. If next week I start using "Itubinho" for Itaú, I have to edit the prompt and redeploy. The agent is not learning — I am.

2. Finn only lives on WhatsApp.
If I want the same assistant on Telegram (where my parents are), on Discord (where my work crew lives), or on Signal (where I keep some chats), I have to build a new connector each time, or fork the plugin and reinstall it elsewhere.

3. Every new capability is a code change.
The "saldo-diff algorithm" I built for Bradesco statements is custom code, deployed via my own deploy.sh script. If I want a Nubank statement parser tomorrow, that's a new TypeScript file, a new test suite, a new deploy. There is no abstraction layer between "I figured out a new way to do something" and "I shipped TypeScript to production."

These aren't OpenClaw's fault, really. They are the limits of building on a framework that thinks of agents as executors of tools — the agent's job is to pick the right tool, with the right arguments, in the right order, and stop.

What I wanted was an agent that accumulated knowledge the way I do. That was when Hermes started looking interesting.

The Shift Hermes Makes: From Tools to Skills

Here is the core conceptual change.

In OpenClaw, the agent's capability surface is a list of tools. Each tool is a function with a JSON schema. The agent picks one, calls it, observes the result, and decides what to do next.

In Hermes, the agent's capability surface is a list of skills. A skill is a markdown file — SKILL.md — with YAML frontmatter and instructions. The agent loads skills on demand, using progressive disclosure: at level zero it only sees skill names and short descriptions (around 3k tokens for the whole catalog), at level one it loads the full skill content, at level two it can pull additional reference files inside the skill's directory. Tools still exist (Hermes ships 70+ of them across 28 toolsets), but skills are the primary abstraction for how the agent works on a problem.

This is what Finn's save_transaction tool would look like as a Hermes skill:

---
name: log-expense
description: Log a personal expense to the finance database with smart defaults
version: 1.0.0
metadata:
  hermes:
    tags: [finance, personal, daily]
    category: finance
    requires_toolsets: [terminal]
---

# Log Expense Skill

## When to Use
The user mentions a purchase, payment, or expense, with or without a category.
Examples: "spent 45 on lunch", "paid Netflix 55.90", "almoço 35 ontem".

## Defaults
- Card: Mastercard (unless user names another)
- Cost center: Me (unless user names another person)
- Date: today (unless user names another date)
- Never ask about defaults — apply them silently.

## Procedure
1. Parse amount, description, and any explicit metadata from the message.
2. Apply defaults to anything not specified.
3. Echo the parsed transaction back to the user for confirmation.
4. On confirmation ("sim", "yes", "✅"), write to Supabase via the terminal tool.
5. Reply with a success message including the new balance for the category.

## Pitfalls
- Do not trust relative dates ("last Tuesday") without confirming the actual date.
- Round currency to 2 decimal places when displaying.
- Watch for duplicate detection warnings before saving.

## Verification
The success message includes the saved record ID and the running category total.

That is the entire "save_transaction" capability, expressed as procedural knowledge instead of a function signature. There's no TypeScript to compile, no schema to maintain in two places, no deploy. If next month I figure out that "Wednesday is my coffee day, default the category to Coffee on Wednesdays unless overridden," I add three lines to the markdown file. Done.

But the bigger shift isn't the syntax. It is that the agent itself can write and update these skills through a tool called skill_manage. This is the agent's procedural memory: when it figures out a non-trivial workflow that worked, it can save the approach as a new skill for next time. After completing a complex task with 5+ tool calls, or after the user corrects its approach, or after it hit errors and found the working path — those are the moments Hermes will offer to crystallize what it learned into a reusable skill.

OpenClaw has nothing equivalent. If I want Finn to learn, I have to learn first, then update the prompt.

Memory: From "Vocabulary Hard-coded in Prompt" to "Memory the Agent Curates"

Finn's vocabulary mappings live in prompts.ts. They are static. To update them I edit a file, build, and redeploy.

Hermes has three memory primitives that change this completely:

SOUL.md — the personality file. Loaded first into every system prompt. This is "who the agent is."
USER.md — what the agent knows about the user. Updated by the agent over time. This is where "roxinho means Nu Bank, feira means grocery store" would naturally land.
MEMORY.md — operational notes the agent curates for itself, with periodic nudges to consolidate.

On top of those, Hermes has session storage in SQLite with FTS5 full-text search, so the agent can search prior conversations semantically. It also integrates Honcho for dialectic user modeling — a deeper layer that builds an evolving model of who you are across sessions.

The contrast with Finn-as-it-stands-today is sharp. In OpenClaw I am the memory. In Hermes the agent maintains its own.

Multi-Platform for Free

Finn lives on WhatsApp because that's the connector I built around. If I want it on Telegram, I write a new connector.

Hermes ships with native gateways for over 20 platforms out of the box — WhatsApp, Telegram, Discord, Slack, Signal, Matrix, Mattermost, Email, SMS, Microsoft Teams, Google Chat, and more. Same agent, same skills, same memory. One Hermes process can serve all of them simultaneously, with per-platform session isolation and unified user authorization.

For Finn, that means my custom plugin code for WhatsApp routing, my deploy script, my session management — those layers either disappear or get absorbed into the gateway's configuration. The work I did to integrate one platform stops being a feature and becomes table stakes.

Side-by-Side: Finn on OpenClaw vs Finn on Hermes

Concern	Finn on OpenClaw (today)	Finn on Hermes (proposed)
Agent abstraction	One agent, 6 tool functions	One agent, N skills (markdown)
Adding a capability	New TypeScript file + deploy	New `SKILL.md` file
Vocabulary ("roxinho = Nu")	Static in `prompts.ts`	Lives in `USER.md`, updated by agent
Cross-session context	Stateless per conversation	SQLite + FTS5, persistent across sessions
Self-improvement	None (I edit prompts)	`skill_manage` lets the agent create/update skills
Platforms	WhatsApp only (custom connector)	20+ native gateways
PDF parsing	Custom TS parsers (`parse-invoice.ts`, OCR fallback)	Skill with helper scripts under `~/.hermes/skills/finn-finance/scripts/`
Scheduled tasks	None (would require new infra)	First-class cron, deliverable to any platform
Language	TypeScript	Python (Hermes core) + markdown skills
Migration	—	`hermes claw migrate` (official command)

The last row is the one I want to underline. The Hermes CLI ships with a dedicated hermes claw migrate command that moves settings, memories, skills, and API keys from an OpenClaw setup directly to Hermes. That is not a hint that the projects are related. That is an official upgrade path.

Where OpenClaw Is Still the Right Choice

I want to be honest here, because most framework comparison posts pretend the winner is universal.

OpenClaw is still the right tool when:

Your team is a TypeScript team. Hermes core is Python. If your stack and your hires are TypeScript-first, the cognitive switch and the deploy story matter. A working OpenClaw plugin in your team's primary language can beat a "better" framework in a language nobody loves.
You need a small, focused, single-purpose agent and you want maximum determinism. A 5-iteration tool-use loop is easy to reason about, easy to debug, easy to put limits on. Hermes can do this too, but it has more layers between you and the model.
You don't need cross-platform reach. If WhatsApp is the only surface you'll ever need, the multi-platform gateway in Hermes is overhead you'll never use.
You don't want the agent rewriting things on its own. skill_manage and agent-curated memory are powerful, but they mean the agent's behavior surface evolves over time. If you need a behavior you wrote on Monday to be exactly the same on Friday, the more static OpenClaw plugin model is easier to audit.

OpenClaw didn't get worse. My ambitions for Finn outgrew its scope.

What You Give Up Choosing Hermes

The honest tradeoffs:

Less deterministic execution. Skills load dynamically; subagents can spawn at runtime via delegate_task. Most of the time, that's fine. For some compliance-critical paths, the static OpenClaw flow is easier to defend in a review.
Maturity gap. Hermes Agent is at v0.10.0. OpenClaw has been running personal agents for longer. Production readiness is a real consideration.
Language switch. Hermes is Python-first. My Finn codebase is TypeScript. The skills layer is markdown so it's portable, but custom helper scripts and tooling integrations would need to be rewritten or wrapped.
Lock-in to the Hermes mental model. Once your agent has accumulated dozens of learned skills over months, porting that institutional knowledge to a third framework is not trivial. The agent's procedural memory is an asset and a form of coupling.

The Migration Path Is Official

Here is what I appreciate about this specific migration: it is not a vague "you could probably do it" path. It is a documented, supported, one-command operation. The Hermes CLI ships with:

hermes claw migrate

This moves settings, memories, skills, and API keys from an OpenClaw installation into Hermes. The two projects share enough conceptual DNA that the migration is real, not aspirational.

My plan for Finn isn't a rewrite. It is a phased migration:

Phase 1 — Run side by side. Install Hermes locally. Create a log-expense skill that mirrors Finn's save_transaction tool. Point it at a sandbox Supabase. Use it on a second WhatsApp number for a week.

Phase 2 — Migrate the simple paths first. save_transaction, query_spending, update_transaction, delete_transaction are good candidates. They are mostly business logic plus a database call. Each becomes one SKILL.md.

Phase 3 — Tackle the hard cases. The PDF parsers — especially the saldo-diff algorithm for Bradesco statements — are real engineering. They become helper scripts under ~/.hermes/skills/finn-finance/scripts/, invoked from the skill via the terminal tool. The skill itself documents when and how to use the script; the Python or TypeScript helper does the actual parsing.

Phase 4 — Decommission OpenClaw. Only after Phase 3 is stable, point the production WhatsApp number at Hermes. Keep the OpenClaw plugin around for a rollback window. Eventually retire it.

The point isn't speed. The point is to not break the thing I use daily.

Closing

When I built Finn I wasn't thinking about whether OpenClaw was the right framework. I was thinking about whether I could ship a working WhatsApp finance agent in a few weekends. The framework choice was downstream of that goal, and OpenClaw made the answer "yes."

Hermes is asking a different question — what does an agent look like when the agent itself participates in its own evolution — through skills it writes, memory it curates, knowledge it accumulates across sessions and platforms.

For a personal finance assistant that I expect to live with me for years, that question is more interesting than the question that brought me to OpenClaw. The migration is on my list.

If you have a working OpenClaw agent and you haven't read the Hermes Agent docs yet, do it before your next "I should add X to it" moment. You might end up writing one markdown file instead of one new TypeScript file.

Top comments (2)

Harjot Singh • Jun 1

really interesting to see how migrating to Hermes has changed your approach to building agents like Finn. the focus on learning and persistence is a game changer. at Moonshift, we help you deploy full next.js apps with postgres and auth in about 7 minutes, and you keep the code on your github. if you're curious, I can set you up for a free run.