<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vicente Junior</title>
    <description>The latest articles on DEV Community by Vicente Junior (@vicente_junior_dev).</description>
    <link>https://dev.to/vicente_junior_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3838257%2Fff6de2e1-5a5f-4aab-bdb3-2099aa03009b.jpg</url>
      <title>DEV Community: Vicente Junior</title>
      <link>https://dev.to/vicente_junior_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vicente_junior_dev"/>
    <language>en</language>
    <item>
      <title>I Built a WhatsApp Finance Agent in OpenClaw. Migrating to Hermes Taught Me What "Self-Improving" Actually Means.</title>
      <dc:creator>Vicente Junior</dc:creator>
      <pubDate>Sat, 23 May 2026 22:15:53 +0000</pubDate>
      <link>https://dev.to/vicente_junior_dev/i-built-a-whatsapp-finance-agent-in-openclaw-migrating-to-hermes-taught-me-what-self-improving-3pp2</link>
      <guid>https://dev.to/vicente_junior_dev/i-built-a-whatsapp-finance-agent-in-openclaw-migrating-to-hermes-taught-me-what-self-improving-3pp2</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;: Write About Hermes Agent&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I built &lt;a href="https://github.com/vicente-r-junior/finn" rel="noopener noreferrer"&gt;&lt;strong&gt;Finn&lt;/strong&gt;&lt;/a&gt; — a personal finance agent for WhatsApp — on &lt;strong&gt;OpenClaw&lt;/strong&gt;, in TypeScript, with a single agent and six tools.&lt;/li&gt;
&lt;li&gt;Migrating to &lt;strong&gt;Hermes Agent&lt;/strong&gt; isn't a framework swap. It's a paradigm shift from "an agent that executes" to "an agent that learns."&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;hermes claw migrate&lt;/code&gt; command makes the OpenClaw → Hermes path official. The gains are real (persistent memory, skill-based abstractions, multi-platform gateway), and so are the tradeoffs.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Meet Finn
&lt;/h2&gt;

&lt;p&gt;A few months ago I built &lt;a href="https://github.com/vicente-r-junior/finn" rel="noopener noreferrer"&gt;Finn&lt;/a&gt;, a personal finance assistant that lives entirely inside WhatsApp. No app to install, no dashboard to remember — just a chat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:  spent 45 on lunch
Finn: $45 · Food · Mastercard · Me · 2026-04-22 — confirm? ✅
You:  sim
Finn: ✅ Saved!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood, Finn is an OpenClaw plugin written in TypeScript. It runs a single agent with a &lt;code&gt;gpt-4.1&lt;/code&gt; tool-use loop (capped at 5 iterations), persists transactions to Supabase, parses credit card PDFs (text-based and OCR via &lt;code&gt;gpt-4o&lt;/code&gt;), and transcribes voice notes through Whisper. It speaks both Portuguese and English depending on the last message it received.&lt;/p&gt;

&lt;p&gt;I shipped it for the OpenClaw Challenge 2026 — you can &lt;a href="https://dev.to/vicente_junior_dev/finn-a-personal-finance-assistant-that-lives-in-whatsapp-2phh"&gt;see the full demo, code samples, and architecture walkthrough here&lt;/a&gt;. It works. I use it every day.&lt;/p&gt;

&lt;p&gt;And then I read the Hermes Agent docs.&lt;/p&gt;

&lt;p&gt;This post is about what happens when you take a working production agent built on OpenClaw and ask: &lt;em&gt;what would this look like in Hermes, and is the upgrade worth it?&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What Finn Already Does Well in OpenClaw
&lt;/h2&gt;

&lt;p&gt;Before I criticize anything, let me be fair to the framework I picked.&lt;/p&gt;

&lt;p&gt;OpenClaw gave me a clean abstraction that mapped naturally to "personal agent on WhatsApp":&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;&lt;code&gt;before_dispatch&lt;/code&gt; hook&lt;/strong&gt; that intercepts incoming messages and routes them through my plugin.&lt;/li&gt;
&lt;li&gt;A simple &lt;strong&gt;tool-use loop&lt;/strong&gt; — define a JSON schema, expose handler functions, let the model call them. Max 5 iterations per turn keeps cost and latency predictable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native WhatsApp gateway&lt;/strong&gt; through OpenClaw's connector model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phone whitelist&lt;/strong&gt; at the gateway level, before my code even sees the message.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The six tools I exposed to the agent map directly to the things a finance assistant needs to do:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;save_transaction&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Persist a confirmed expense, income, or card payment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;query_spending&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Query totals and breakdowns from the database&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;save_bulk_transactions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bulk-save invoice items from a PDF import&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;save_bank_statement&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bulk-save bank statement rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;update_transaction&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Edit a saved record (with confirmation)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;delete_transaction&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete a saved record (with confirmation)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every mutation requires explicit user confirmation before the tool is called. The whole architecture fits on one page:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WhatsApp
    │
    ▼
OpenClaw Gateway  ──(before_dispatch hook)──▶  Finn Plugin (TypeScript)
                                                  │
                            ┌─────────────────────┤
                            ▼                     ▼
                       Text / Audio              PDF
                            │                     │
                       runAgent()           Custom parsers
                       (gpt-4.1 loop)      (text + OCR fallback)
                            │                     │
                            └──────────┬──────────┘
                                       ▼
                                  Supabase
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's a clean design. I'm proud of it. But the longer I used Finn, the longer my wishlist got.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Walls I Hit
&lt;/h2&gt;

&lt;p&gt;After a few weeks of daily use, three things started bothering me:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Finn forgets everything between conversations.&lt;/strong&gt;&lt;br&gt;
When I say "roxinho" I mean my Nu Bank card. When I say "feira" I mean grocery store. Today, those mappings live in the system prompt. They don't grow. If next week I start using "Itubinho" for Itaú, I have to edit the prompt and redeploy. The agent is not learning — I am.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Finn only lives on WhatsApp.&lt;/strong&gt;&lt;br&gt;
If I want the same assistant on Telegram (where my parents are), on Discord (where my work crew lives), or on Signal (where I keep some chats), I have to build a new connector each time, or fork the plugin and reinstall it elsewhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Every new capability is a code change.&lt;/strong&gt;&lt;br&gt;
The "saldo-diff algorithm" I built for Bradesco statements is custom code, deployed via my own &lt;code&gt;deploy.sh&lt;/code&gt; script. If I want a Nubank statement parser tomorrow, that's a new TypeScript file, a new test suite, a new deploy. There is no abstraction layer between "I figured out a new way to do something" and "I shipped TypeScript to production."&lt;/p&gt;

&lt;p&gt;These aren't OpenClaw's fault, really. They are the limits of building on a framework that thinks of agents as &lt;strong&gt;executors of tools&lt;/strong&gt; — the agent's job is to pick the right tool, with the right arguments, in the right order, and stop.&lt;/p&gt;

&lt;p&gt;What I wanted was an agent that &lt;strong&gt;accumulated knowledge&lt;/strong&gt; the way I do. That was when Hermes started looking interesting.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Shift Hermes Makes: From Tools to Skills
&lt;/h2&gt;

&lt;p&gt;Here is the core conceptual change.&lt;/p&gt;

&lt;p&gt;In OpenClaw, the agent's capability surface is &lt;strong&gt;a list of tools&lt;/strong&gt;. Each tool is a function with a JSON schema. The agent picks one, calls it, observes the result, and decides what to do next.&lt;/p&gt;

&lt;p&gt;In Hermes, the agent's capability surface is &lt;strong&gt;a list of skills&lt;/strong&gt;. A skill is a markdown file — &lt;code&gt;SKILL.md&lt;/code&gt; — with YAML frontmatter and instructions. The agent loads skills on demand, using &lt;strong&gt;progressive disclosure&lt;/strong&gt;: at level zero it only sees skill names and short descriptions (around 3k tokens for the whole catalog), at level one it loads the full skill content, at level two it can pull additional reference files inside the skill's directory. Tools still exist (Hermes ships 70+ of them across 28 toolsets), but skills are the primary abstraction for &lt;em&gt;how the agent works on a problem&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This is what Finn's &lt;code&gt;save_transaction&lt;/code&gt; tool would look like as a Hermes skill:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;log-expense&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Log a personal expense to the finance database with smart defaults&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.0.0&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hermes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;finance&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;personal&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;daily&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;finance&lt;/span&gt;
    &lt;span class="na"&gt;requires_toolsets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;terminal&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gh"&gt;# Log Expense Skill&lt;/span&gt;

&lt;span class="gu"&gt;## When to Use&lt;/span&gt;
The user mentions a purchase, payment, or expense, with or without a category.
Examples: "spent 45 on lunch", "paid Netflix 55.90", "almoço 35 ontem".

&lt;span class="gu"&gt;## Defaults&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Card: Mastercard (unless user names another)
&lt;span class="p"&gt;-&lt;/span&gt; Cost center: Me (unless user names another person)
&lt;span class="p"&gt;-&lt;/span&gt; Date: today (unless user names another date)
&lt;span class="p"&gt;-&lt;/span&gt; Never ask about defaults — apply them silently.

&lt;span class="gu"&gt;## Procedure&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Parse amount, description, and any explicit metadata from the message.
&lt;span class="p"&gt;2.&lt;/span&gt; Apply defaults to anything not specified.
&lt;span class="p"&gt;3.&lt;/span&gt; Echo the parsed transaction back to the user for confirmation.
&lt;span class="p"&gt;4.&lt;/span&gt; On confirmation ("sim", "yes", "✅"), write to Supabase via the terminal tool.
&lt;span class="p"&gt;5.&lt;/span&gt; Reply with a success message including the new balance for the category.

&lt;span class="gu"&gt;## Pitfalls&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Do not trust relative dates ("last Tuesday") without confirming the actual date.
&lt;span class="p"&gt;-&lt;/span&gt; Round currency to 2 decimal places when displaying.
&lt;span class="p"&gt;-&lt;/span&gt; Watch for duplicate detection warnings before saving.

&lt;span class="gu"&gt;## Verification&lt;/span&gt;
The success message includes the saved record ID and the running category total.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the entire "save_transaction" capability, expressed as procedural knowledge instead of a function signature. There's no TypeScript to compile, no schema to maintain in two places, no deploy. If next month I figure out that "Wednesday is my coffee day, default the category to Coffee on Wednesdays unless overridden," I add three lines to the markdown file. Done.&lt;/p&gt;

&lt;p&gt;But the bigger shift isn't the syntax. It is that &lt;strong&gt;the agent itself can write and update these skills&lt;/strong&gt; through a tool called &lt;code&gt;skill_manage&lt;/code&gt;. This is the agent's procedural memory: when it figures out a non-trivial workflow that worked, it can save the approach as a new skill for next time. After completing a complex task with 5+ tool calls, or after the user corrects its approach, or after it hit errors and found the working path — those are the moments Hermes will offer to crystallize what it learned into a reusable skill.&lt;/p&gt;

&lt;p&gt;OpenClaw has nothing equivalent. If I want Finn to learn, &lt;em&gt;I&lt;/em&gt; have to learn first, then update the prompt.&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory: From "Vocabulary Hard-coded in Prompt" to "Memory the Agent Curates"
&lt;/h2&gt;

&lt;p&gt;Finn's vocabulary mappings live in &lt;code&gt;prompts.ts&lt;/code&gt;. They are static. To update them I edit a file, build, and redeploy.&lt;/p&gt;

&lt;p&gt;Hermes has three memory primitives that change this completely:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;SOUL.md&lt;/code&gt;&lt;/strong&gt; — the personality file. Loaded first into every system prompt. This is "who the agent is."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;USER.md&lt;/code&gt;&lt;/strong&gt; — what the agent knows about the user. Updated by the agent over time. This is where "roxinho means Nu Bank, feira means grocery store" would naturally land.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;MEMORY.md&lt;/code&gt;&lt;/strong&gt; — operational notes the agent curates for itself, with periodic nudges to consolidate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On top of those, Hermes has session storage in &lt;strong&gt;SQLite with FTS5 full-text search&lt;/strong&gt;, so the agent can search prior conversations semantically. It also integrates &lt;a href="https://github.com/plastic-labs/honcho" rel="noopener noreferrer"&gt;Honcho&lt;/a&gt; for dialectic user modeling — a deeper layer that builds an evolving model of who you are across sessions.&lt;/p&gt;

&lt;p&gt;The contrast with Finn-as-it-stands-today is sharp. In OpenClaw I am the memory. In Hermes the agent maintains its own.&lt;/p&gt;




&lt;h2&gt;
  
  
  Multi-Platform for Free
&lt;/h2&gt;

&lt;p&gt;Finn lives on WhatsApp because that's the connector I built around. If I want it on Telegram, I write a new connector.&lt;/p&gt;

&lt;p&gt;Hermes ships with native gateways for &lt;strong&gt;over 20 platforms&lt;/strong&gt; out of the box — WhatsApp, Telegram, Discord, Slack, Signal, Matrix, Mattermost, Email, SMS, Microsoft Teams, Google Chat, and more. Same agent, same skills, same memory. One Hermes process can serve all of them simultaneously, with per-platform session isolation and unified user authorization.&lt;/p&gt;

&lt;p&gt;For Finn, that means my custom plugin code for WhatsApp routing, my deploy script, my session management — those layers either disappear or get absorbed into the gateway's configuration. The work I did to integrate one platform stops being a feature and becomes table stakes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Side-by-Side: Finn on OpenClaw vs Finn on Hermes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Finn on OpenClaw (today)&lt;/th&gt;
&lt;th&gt;Finn on Hermes (proposed)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Agent abstraction&lt;/td&gt;
&lt;td&gt;One agent, 6 tool functions&lt;/td&gt;
&lt;td&gt;One agent, N skills (markdown)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Adding a capability&lt;/td&gt;
&lt;td&gt;New TypeScript file + deploy&lt;/td&gt;
&lt;td&gt;New &lt;code&gt;SKILL.md&lt;/code&gt; file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vocabulary ("roxinho = Nu")&lt;/td&gt;
&lt;td&gt;Static in &lt;code&gt;prompts.ts&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Lives in &lt;code&gt;USER.md&lt;/code&gt;, updated by agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-session context&lt;/td&gt;
&lt;td&gt;Stateless per conversation&lt;/td&gt;
&lt;td&gt;SQLite + FTS5, persistent across sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Self-improvement&lt;/td&gt;
&lt;td&gt;None (I edit prompts)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;skill_manage&lt;/code&gt; lets the agent create/update skills&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platforms&lt;/td&gt;
&lt;td&gt;WhatsApp only (custom connector)&lt;/td&gt;
&lt;td&gt;20+ native gateways&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PDF parsing&lt;/td&gt;
&lt;td&gt;Custom TS parsers (&lt;code&gt;parse-invoice.ts&lt;/code&gt;, OCR fallback)&lt;/td&gt;
&lt;td&gt;Skill with helper scripts under &lt;code&gt;~/.hermes/skills/finn-finance/scripts/&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scheduled tasks&lt;/td&gt;
&lt;td&gt;None (would require new infra)&lt;/td&gt;
&lt;td&gt;First-class cron, deliverable to any platform&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;TypeScript&lt;/td&gt;
&lt;td&gt;Python (Hermes core) + markdown skills&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Migration&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;hermes claw migrate&lt;/code&gt; (official command)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The last row is the one I want to underline. The Hermes CLI ships with a dedicated &lt;code&gt;hermes claw migrate&lt;/code&gt; command that moves settings, memories, skills, and API keys from an OpenClaw setup directly to Hermes. That is not a hint that the projects are related. That is an official upgrade path.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where OpenClaw Is Still the Right Choice
&lt;/h2&gt;

&lt;p&gt;I want to be honest here, because most framework comparison posts pretend the winner is universal.&lt;/p&gt;

&lt;p&gt;OpenClaw is still the right tool when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Your team is a TypeScript team.&lt;/strong&gt; Hermes core is Python. If your stack and your hires are TypeScript-first, the cognitive switch and the deploy story matter. A working OpenClaw plugin in your team's primary language can beat a "better" framework in a language nobody loves.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You need a small, focused, single-purpose agent and you want maximum determinism.&lt;/strong&gt; A 5-iteration tool-use loop is easy to reason about, easy to debug, easy to put limits on. Hermes can do this too, but it has more layers between you and the model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You don't need cross-platform reach.&lt;/strong&gt; If WhatsApp is the only surface you'll ever need, the multi-platform gateway in Hermes is overhead you'll never use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You don't want the agent rewriting things on its own.&lt;/strong&gt; &lt;code&gt;skill_manage&lt;/code&gt; and agent-curated memory are powerful, but they mean the agent's behavior surface evolves over time. If you need a behavior you wrote on Monday to be exactly the same on Friday, the more static OpenClaw plugin model is easier to audit.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;OpenClaw didn't get worse. My ambitions for Finn outgrew its scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  What You Give Up Choosing Hermes
&lt;/h2&gt;

&lt;p&gt;The honest tradeoffs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Less deterministic execution.&lt;/strong&gt; Skills load dynamically; subagents can spawn at runtime via &lt;code&gt;delegate_task&lt;/code&gt;. Most of the time, that's fine. For some compliance-critical paths, the static OpenClaw flow is easier to defend in a review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maturity gap.&lt;/strong&gt; Hermes Agent is at &lt;strong&gt;v0.10.0&lt;/strong&gt;. OpenClaw has been running personal agents for longer. Production readiness is a real consideration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Language switch.&lt;/strong&gt; Hermes is Python-first. My Finn codebase is TypeScript. The skills layer is markdown so it's portable, but custom helper scripts and tooling integrations would need to be rewritten or wrapped.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lock-in to the Hermes mental model.&lt;/strong&gt; Once your agent has accumulated dozens of learned skills over months, porting that institutional knowledge to a third framework is not trivial. The agent's procedural memory is an asset &lt;em&gt;and&lt;/em&gt; a form of coupling.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Migration Path Is Official
&lt;/h2&gt;

&lt;p&gt;Here is what I appreciate about this specific migration: it is not a vague "you could probably do it" path. It is a documented, supported, one-command operation. The Hermes CLI ships with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes claw migrate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This moves settings, memories, skills, and API keys from an OpenClaw installation into Hermes. The two projects share enough conceptual DNA that the migration is real, not aspirational.&lt;/p&gt;

&lt;p&gt;My plan for Finn isn't a rewrite. It is a phased migration:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 — Run side by side.&lt;/strong&gt; Install Hermes locally. Create a &lt;code&gt;log-expense&lt;/code&gt; skill that mirrors Finn's &lt;code&gt;save_transaction&lt;/code&gt; tool. Point it at a sandbox Supabase. Use it on a second WhatsApp number for a week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 — Migrate the simple paths first.&lt;/strong&gt; &lt;code&gt;save_transaction&lt;/code&gt;, &lt;code&gt;query_spending&lt;/code&gt;, &lt;code&gt;update_transaction&lt;/code&gt;, &lt;code&gt;delete_transaction&lt;/code&gt; are good candidates. They are mostly business logic plus a database call. Each becomes one &lt;code&gt;SKILL.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 3 — Tackle the hard cases.&lt;/strong&gt; The PDF parsers — especially the saldo-diff algorithm for Bradesco statements — are real engineering. They become helper scripts under &lt;code&gt;~/.hermes/skills/finn-finance/scripts/&lt;/code&gt;, invoked from the skill via the terminal tool. The skill itself documents &lt;em&gt;when&lt;/em&gt; and &lt;em&gt;how&lt;/em&gt; to use the script; the Python or TypeScript helper does the actual parsing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 4 — Decommission OpenClaw.&lt;/strong&gt; Only after Phase 3 is stable, point the production WhatsApp number at Hermes. Keep the OpenClaw plugin around for a rollback window. Eventually retire it.&lt;/p&gt;

&lt;p&gt;The point isn't speed. The point is to not break the thing I use daily.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;When I built Finn I wasn't thinking about whether OpenClaw was the right framework. I was thinking about whether I could ship a working WhatsApp finance agent in a few weekends. The framework choice was downstream of that goal, and OpenClaw made the answer "yes."&lt;/p&gt;

&lt;p&gt;Hermes is asking a different question — what does an agent look like when &lt;strong&gt;the agent itself participates in its own evolution&lt;/strong&gt; — through skills it writes, memory it curates, knowledge it accumulates across sessions and platforms.&lt;/p&gt;

&lt;p&gt;For a personal finance assistant that I expect to live with me for years, that question is more interesting than the question that brought me to OpenClaw. The migration is on my list.&lt;/p&gt;

&lt;p&gt;If you have a working OpenClaw agent and you haven't read the Hermes Agent docs yet, do it before your next "I should add X to it" moment. You might end up writing one markdown file instead of one new TypeScript file.&lt;/p&gt;




&lt;h2&gt;
  
  
  Further Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Hermes Agent Documentation&lt;/a&gt; — the architecture overview is the best 20-minute investment&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hermes-agent.nousresearch.com/docs/user-guide/features/skills" rel="noopener noreferrer"&gt;Skills System&lt;/a&gt; — the spec for skill files, progressive disclosure, agent-managed updates&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://hermes-agent.nousresearch.com/docs/developer-guide/agent-loop" rel="noopener noreferrer"&gt;Agent Loop Internals&lt;/a&gt; — for those who want to see how the engine actually works&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt; — the open standard skills follow, so they are portable beyond Hermes&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vicente-r-junior/finn" rel="noopener noreferrer"&gt;Finn on GitHub&lt;/a&gt; — the OpenClaw plugin discussed throughout this post&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/vicente_junior_dev/finn-a-personal-finance-assistant-that-lives-in-whatsapp-2phh"&gt;Finn — A Personal Finance Assistant That Lives in WhatsApp&lt;/a&gt; — my detailed OpenClaw Challenge write-up with full demos and code&lt;/li&gt;
&lt;/ul&gt;







</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
    <item>
      <title>I gave Gemini 3.5 Flash a CVE-fix PR to review. It found another bug in the same file.</title>
      <dc:creator>Vicente Junior</dc:creator>
      <pubDate>Fri, 22 May 2026 22:40:14 +0000</pubDate>
      <link>https://dev.to/vicente_junior_dev/i-gave-gemini-35-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same-file-1g24</link>
      <guid>https://dev.to/vicente_junior_dev/i-gave-gemini-35-flash-a-cve-fix-pr-to-review-it-found-another-bug-in-the-same-file-1g24</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Across &lt;strong&gt;3 real production PRs&lt;/strong&gt;, I asked Gemini 3.5 Flash to do a code review. The model — announced this week at Google I/O 2026 — caught &lt;strong&gt;3 legitimate bugs, hallucinated 0&lt;/strong&gt;, in roughly 4 seconds per PR. The middle PR was the patch for a known security vulnerability in Fastify (CVE-2026-25223, a validation-bypass). The model flagged a second, unrelated regex bug &lt;strong&gt;in the exact file being patched&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's what I learned building a code-review agent in about 2 hours with Google's new model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I tested this
&lt;/h2&gt;

&lt;p&gt;At the I/O keynote, Sundar Pichai pitched Gemini 3.5 Flash as "frontier intelligence combined with action" — optimized for agentic coding and long-horizon tasks. Code review is the perfect stress test: it requires reasoning about code semantics, cross-file context, and judgment about what matters.&lt;/p&gt;

&lt;p&gt;Reading another 50 hype threads on X felt pointless. So I built the smallest possible agent that could actually use the model on real code, ran it on three concrete PRs, and counted what it got right, what it made up, and what it missed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The architecture
&lt;/h2&gt;

&lt;p&gt;Three stages, ~80 lines of TypeScript, runs on Node 20+:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INPUT                  PROCESSING                       OUTPUT
─────                  ──────────                       ──────
owner/repo#N    →      1. fetch the .diff URL      →    stdout (colored summary)
                       2. truncate if &amp;gt; 150k chars      out/{slug}.json
                       3. build prompt + schema         out/{slug}.md
                       4. Gemini 3.5 Flash call
                       5. Zod-parse the response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No GitHub token (public PRs use the unauthenticated &lt;code&gt;.diff&lt;/code&gt; URL). No octokit. No frameworks. Just the new &lt;code&gt;@google/genai&lt;/code&gt; SDK with structured output.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core
&lt;/h2&gt;

&lt;p&gt;The heart of the pipeline is a single &lt;code&gt;review()&lt;/code&gt; function — pass it a diff, get back a typed array of issues:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;GoogleGenAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@google/genai&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;zodToJsonSchema&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;zod-to-json-schema&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ai&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;GoogleGenAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;IssueSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;severity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;low&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;high&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;critical&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
  &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;bug&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;security&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;performance&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;style&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;logic&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;maintainability&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
  &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;nullable&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ReviewSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;issues&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;IssueSchema&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`You are a senior code reviewer. Analyze the unified git
diff below and produce a JSON review.

Rules:
- Flag REAL issues only — no nitpicks, no style preferences.
- Prefer fewer, higher-quality issues over volume.
- Each "message" must explain WHY it matters (impact, not just observation).
- If you cannot see enough context to be sure, lower the severity.

Return the full review as JSON matching the provided schema.`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gemini-3.5-flash&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;\n\n--- DIFF ---\n&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;diff&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;responseMimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;responseJsonSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;zodToJsonSchema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ReviewSchema&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;ReviewSchema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;{}&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few details worth flagging:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model string:&lt;/strong&gt; &lt;code&gt;"gemini-3.5-flash"&lt;/code&gt;. GA since May 19, 2026.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured output:&lt;/strong&gt; use &lt;code&gt;responseJsonSchema&lt;/code&gt; (not the older &lt;code&gt;responseSchema&lt;/code&gt;). It validates against the Zod-derived schema and returns conformant JSON. No regex-parsing the response, no try/catch for malformed output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No temperature tuning:&lt;/strong&gt; Google explicitly recommends not setting &lt;code&gt;temperature&lt;/code&gt;, &lt;code&gt;top_p&lt;/code&gt;, or &lt;code&gt;top_k&lt;/code&gt; on the 3.5 family — the model handles sampling internally.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full repo at the end. Now the interesting part.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three PRs
&lt;/h2&gt;

&lt;p&gt;I picked PRs with very different shapes to see how the model behaved across contexts.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;PR&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Lines&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/expressjs/express/pull/6190" rel="noopener noreferrer"&gt;express#6190&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Small refactor&lt;/td&gt;
&lt;td&gt;~10&lt;/td&gt;
&lt;td&gt;Baseline: clean code, no real issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/fastify/fastify/pull/6414" rel="noopener noreferrer"&gt;fastify#6414&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Security-sensitive&lt;/td&gt;
&lt;td&gt;+398 / −147&lt;/td&gt;
&lt;td&gt;The patch for CVE-2026-25223&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://github.com/expressjs/express/pull/6100" rel="noopener noreferrer"&gt;express#6100&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Small refactor&lt;/td&gt;
&lt;td&gt;~15&lt;/td&gt;
&lt;td&gt;Different file, different style&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Final scorecard
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PR #1 (express#6190):    +0  −0   Model agreed: no issues
PR #2 (fastify#6414):    +3  −0   3 hits, 0 hallucinations
PR #3 (express#6100):    +0  −0   Model agreed: no issues
──────────────────────────────────────────────────────────────
Total:                   +3  −0   Zero false positives.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What it caught — the headline
&lt;/h2&gt;

&lt;p&gt;PR #2 is the one that mattered. Fastify pull #6414 rewrote the entire content-type parser to fix a security flaw (CVE-2026-25223) where attackers could bypass body validation by appending a tab character to &lt;code&gt;Content-Type&lt;/code&gt; (e.g. &lt;code&gt;application/json\tx&lt;/code&gt;). The fix introduced a new &lt;code&gt;ContentType&lt;/code&gt; class and replaced the old loose string-matching logic.&lt;/p&gt;

&lt;p&gt;This is exactly the kind of high-stakes, security-sensitive refactor where an automated reviewer either earns its place or doesn't.&lt;/p&gt;

&lt;p&gt;The model flagged three issues. Here's each one, verified against the actual code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hit 1: inconsistent variable use in &lt;code&gt;existingParser&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;MEDIUM · logic&lt;/strong&gt; — The &lt;code&gt;existingParser&lt;/code&gt; method checks &lt;code&gt;contentType === "application/json"&lt;/code&gt; and &lt;code&gt;this.customParsers.has(contentType)&lt;/code&gt; using the original &lt;code&gt;contentType&lt;/code&gt; string instead of the newly calculated, normalized &lt;code&gt;ct&lt;/code&gt; variable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Looking at the new code in &lt;code&gt;lib/content-type-parser.js&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="nx"&gt;ContentTypeParser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prototype&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;existingParser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;function &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;contentType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ContentType&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contentType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customParsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customParsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;kDefaultJsonParse&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contentType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text/plain&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customParsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;customParsers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ct&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;defaultPlainTextParser&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hasParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model is right. &lt;code&gt;ct&lt;/code&gt; is the normalized version, but the conditional guards still test the raw &lt;code&gt;contentType&lt;/code&gt;. Since &lt;code&gt;customParsers&lt;/code&gt; only holds normalized keys (see line 85: &lt;code&gt;this.customParsers.set(normalizedContentType, parser)&lt;/code&gt;), any header with a different case or trailing parameters silently skips the fast path. Subtle, easy to miss in review.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hit 2: a regex missing its end anchor
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;HIGH · security&lt;/strong&gt; — The &lt;code&gt;subtypeNameReg&lt;/code&gt; regular expression is missing a trailing &lt;code&gt;$&lt;/code&gt; anchor. Consequently, any string starting with a valid subtype will match successfully.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This one is the headline. In the &lt;strong&gt;brand new file&lt;/strong&gt; &lt;code&gt;lib/content-type.js&lt;/code&gt;, the patch defines two parallel regexes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;typeNameReg&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[\w&lt;/span&gt;&lt;span class="sr"&gt;!#$%&amp;amp;'*+.^`|~-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+$/&lt;/span&gt;      &lt;span class="c1"&gt;// has $&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;subtypeNameReg&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[\w&lt;/span&gt;&lt;span class="sr"&gt;!#$%&amp;amp;'*+.^`|~-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;*/&lt;/span&gt;    &lt;span class="c1"&gt;// no $&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The subtype regex anchors at the start but not at the end. Inputs like &lt;code&gt;application/json/extra&lt;/code&gt; pass the validation gate where they shouldn't. In a PR whose entire purpose is fixing a validation-bypass CVE, a senior reviewer would put this in red on the first pass. The model put it in HIGH on the first pass.&lt;/p&gt;

&lt;p&gt;I am not claiming this is itself exploitable at the same severity as the original CVE — the downstream parsers may not be reachable in a way that materializes the bug. But the pattern is exactly the class of issue that &lt;em&gt;did&lt;/em&gt; materialize as CVE-2026-25223. Pattern-recognition of dangerous shapes is half of what code review is.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hit 3: stateful global regex
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;MEDIUM · bug&lt;/strong&gt; — The &lt;code&gt;keyValuePairsReg&lt;/code&gt; regex is defined globally with the &lt;code&gt;/g&lt;/code&gt; flag. Because of this, it is stateful and relies on &lt;code&gt;lastIndex&lt;/code&gt;. If parsing throws an exception or future modifications exit the loop early, &lt;code&gt;lastIndex&lt;/code&gt; will not reset to 0.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Confirmed at the top of &lt;code&gt;lib/content-type.js&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;keyValuePairsReg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;([\w&lt;/span&gt;&lt;span class="sr"&gt;!#$%&amp;amp;'*+.^`|~-&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;=&lt;/span&gt;&lt;span class="se"&gt;([^&lt;/span&gt;&lt;span class="sr"&gt;;&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="sr"&gt;/gm&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Used inside a class constructor with &lt;code&gt;.exec()&lt;/code&gt; in a loop. In healthy execution, &lt;code&gt;lastIndex&lt;/code&gt; resets to 0 when &lt;code&gt;exec&lt;/code&gt; returns &lt;code&gt;null&lt;/code&gt;. But the failure mode — exception inside the loop body, or any future &lt;code&gt;break&lt;/code&gt; — silently corrupts every subsequent parse for the lifetime of the process. The model's suggested fix (use &lt;code&gt;matchAll&lt;/code&gt; instead) is exactly the JavaScript-idiomatic answer.&lt;/p&gt;

&lt;p&gt;This is a latent footgun, not a live bug. Severity MEDIUM is arguably high. But it's a real thing the model saw.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it didn't catch — the honest part
&lt;/h2&gt;

&lt;p&gt;Two failure modes worth being honest about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cross-file context.&lt;/strong&gt; The model only sees the diff. It can't tell whether a function called by the changed code is safe, whether a removed branch was load-bearing somewhere else, or whether tests actually cover the new behavior. For PR #6414 in particular, the upstream callers of the new &lt;code&gt;ContentType&lt;/code&gt; class are not in the diff, and the model never reasoned about them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Severity calibration is rough.&lt;/strong&gt; The regex-without-anchor is HIGH. The stateful &lt;code&gt;/g&lt;/code&gt; is MEDIUM. In practice, those probably want to swap — the regex one is a clear pattern with security relevance, the global-regex one is a latent footgun unlikely to fire. Junior-reviewer instincts.&lt;/p&gt;

&lt;p&gt;I also can't conclusively measure what the model missed without reviewing every comment thread on the PR by hand. The merged commit went through multiple rounds of feedback (commits like "address feedback", "refactor algorithm", "appease coverage"), so reviewers did catch things, but how many of those are in-diff issues a tool could have seen versus broader design decisions — I'd need another afternoon to know.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd actually use this for
&lt;/h2&gt;

&lt;p&gt;Three takeaways after running this on real code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;It earns a place as a first-layer pre-review.&lt;/strong&gt; Specifically: PRs that touch parsers, validators, or anything that consumes external input. The cost is around $0.003 per PR. The cost of &lt;em&gt;not&lt;/em&gt; running it is shipping a regex without an anchor on a security-sensitive code path.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It does not replace human reviewers.&lt;/strong&gt; It cannot reason about distributed state, concurrency, transactions, or anything that requires understanding multiple files in concert.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucination rate was zero in this sample&lt;/strong&gt; — but the sample is tiny. The literature on similar models suggests false positives in the 15-25% range on real-world PRs. Three out of three being valid is great but is not a benchmark.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The 80 lines of TypeScript that produced this run are on &lt;a href="https://github.com/vicente-r-junior/gemini-code-review" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Two things that are non-obvious about the setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;@google/genai&lt;/code&gt; v2 uses &lt;code&gt;responseJsonSchema&lt;/code&gt;, not &lt;code&gt;responseSchema&lt;/code&gt;. Easy to get wrong if you're translating tutorial code from an older Gemini.&lt;/li&gt;
&lt;li&gt;Public GitHub PRs expose a &lt;code&gt;.diff&lt;/code&gt; endpoint that requires no auth. You don't need octokit for an MVP.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you try it on PRs with shapes I didn't test — concurrency-heavy, multi-file, generated code — tell me what you find. The interesting question is where the model breaks, not where it works.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built and tested in May 2026 with Gemini 3.5 Flash, GA two days before publication.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>googleiochallenge</category>
      <category>devchallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Finn 💰 — A Personal Finance Assistant That Lives in WhatsApp</title>
      <dc:creator>Vicente Junior</dc:creator>
      <pubDate>Sat, 25 Apr 2026 15:53:31 +0000</pubDate>
      <link>https://dev.to/vicente_junior_dev/finn-a-personal-finance-assistant-that-lives-in-whatsapp-2phh</link>
      <guid>https://dev.to/vicente_junior_dev/finn-a-personal-finance-assistant-that-lives-in-whatsapp-2phh</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Finn&lt;/strong&gt; is a personal finance assistant that lives entirely in WhatsApp. No app to install, no dashboard to remember to open — you just message Finn the way you'd message a friend.&lt;/p&gt;

&lt;p&gt;The problem Finn solves is a real one: most personal finance tools require you to change your habits (log into an app, categorize manually, remember later). Finn fits into a habit you already have — checking WhatsApp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it can do:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📝 &lt;strong&gt;Log expenses and income&lt;/strong&gt; via text or voice note (Whisper transcription)&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;Parse credit card PDF invoices&lt;/strong&gt; — send the PDF, review the itemized list, confirm, saved&lt;/li&gt;
&lt;li&gt;🏦 &lt;strong&gt;Parse bank account statement PDFs&lt;/strong&gt; — transactions extracted via a saldo-diff algorithm that works around garbled PDF text&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Query spending&lt;/strong&gt; — by period, category, card, or cost center (cross-source: credit cards + bank account + manual entries combined)&lt;/li&gt;
&lt;li&gt;✏️ &lt;strong&gt;Edit or delete&lt;/strong&gt; any saved record by just describing it&lt;/li&gt;
&lt;li&gt;🔒 &lt;strong&gt;Phone whitelist&lt;/strong&gt; — only your number gets a response&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;All four scenarios below are real conversations with a running Finn instance.&lt;/p&gt;




&lt;h3&gt;
  
  
  🟢 Demo 1 — Log a Single Expense
&lt;/h3&gt;

&lt;p&gt;The most common interaction: user mentions an amount and a description. Finn classifies the category and card automatically, shows a confirmation line, and only saves after approval.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   189 on lunch
Finn:  $189 · Food · Mastercard · Me · 2026-04-22 — confirm? ✅

You:   yeah
Finn:  ✅ Saved! Anything else?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the description is ambiguous, Finn asks exactly one clarifying question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   spent 80 today
Finn:  What was it for?

You:   pharmacy
Finn:  $80 · Pharmacy · Mastercard · Me · 2026-04-22 — confirm? ✅

You:   yes
Finn:  ✅ Saved!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Voice notes work the same way — Finn echoes the transcription before acting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   🎙️ [audio: "spent ninety at the grocery store today"]
Finn:  🎙️ _"spent ninety at the grocery store today"_

       $90 · Groceries · Mastercard · Me · 2026-04-22 — confirm? ✅

You:   yes
Finn:  ✅ Saved!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  ✏️ Demo 2 — Change Something
&lt;/h3&gt;

&lt;p&gt;You can correct any saved record in plain language. Finn finds the most recent matching transaction, shows what will change, and waits for confirmation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Change the amount:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   the pharmacy one was actually 95, not 80
Finn:  Update $80 → $95 · Pharmacy · Mastercard · Me · 2026-04-22 — confirm? ✅

You:   yes
Finn:  ✅ Updated!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Change the category:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   that lunch from yesterday, mark it as Entertainment
Finn:  Update Food → Entertainment · $189 · Mastercard · Me · 2026-04-22 — confirm? ✅

You:   yes
Finn:  ✅ Updated!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Delete a record:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   delete the $90 grocery entry
Finn:  Delete $90 · Groceries · Mastercard · Me · 2026-04-22 — are you sure? 🗑️

You:   yes
Finn:  ✅ Deleted.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  📄 Demo 3 — Import a Credit Card PDF Invoice
&lt;/h3&gt;

&lt;p&gt;You forward the invoice PDF to the WhatsApp chat. Finn parses it, shows you every line, resolves unknown categories, and bulk-saves on one "yes".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   [sends Mastercard invoice PDF]

Finn:  📄 Mastercard · Due 2026-04-10 · $2,340.00 · 18 items (16 new · 2 skipped)

       📊 Invoice Summary
       Purchases: $2,680.00 · Financed balance (credit): -$520.00 · Interest: +$180.00 · Total due: $2,340.00

       1. ✅ UBER*TRIP
          Mar 12 · $18.50 · Transport · Me

       2. ✅ NETFLIX.COM
          Mar 13 · $15.99 · Entertainment · Me

       3. ✅ AMAZON PRIME
          Mar 14 · $42.30 · Shopping · Me (intl · USD 7.99)

       4. ❓ SQ*DOWNTOWN MKT
          Mar 15 · $34.00 · ❓ · Me

       5. ✅ SHELL GAS STATION
          Mar 17 · $95.00 · Transport · Me

       6. ✅ DELTA AIRLINES 01/03
          Mar 19 · $320.00 · Travel · Me
       ...

       ❓ What category for: #4 SQ*DOWNTOWN MKT ($34.00)?

You:   Groceries

Finn:  Got it. Save all 16 items? (2 duplicates will be skipped)

You:   yes

Finn:  ✅ Saved 16 transactions · Total $2,340.00
       Transport: $210.50 · Shopping: $380.00 · Food: $195.00 · Entertainment: $95.99 · Travel: $320.00 · Other: $1,138.51
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The "Invoice Summary" line reconciles the invoice math: &lt;code&gt;Purchases + Interest + Financed balance = Total due&lt;/code&gt;. Finn stores the interest and financed balance as signed rows so &lt;code&gt;SELECT SUM(amount)&lt;/code&gt; always equals the invoice total.&lt;/p&gt;

&lt;p&gt;For image-based (scanned) PDFs, Finn automatically falls back to GPT-4.1 Vision OCR — same output, no extra steps for the user.&lt;/p&gt;




&lt;h3&gt;
  
  
  🔍 Demo 4 — Query Spending
&lt;/h3&gt;

&lt;p&gt;Finn always queries the database — it never answers from memory or inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Total this month:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   how much did I spend this month?
Finn:  📊 All spending — last 60 days
       Total: $6,234.18

       Shopping:     $1,402.10
       Food:           $534.90
       Transport:      $487.20
       Entertainment:  $211.70
       Travel:         $420.00
       Other:        $3,178.28
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;By category:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   breakdown of Transport
Finn:  🚗 Transport — last 60 days
       $487.20 across 8 transactions

       Mar 15 · Uber              $34.90  (Mastercard)
       Mar 20 · Posto Ipiranga   $180.00  (Mastercard)
       Apr 01 · Uber              $28.50  (Mastercard)
       Apr 05 · Shell Gas        $155.00  (Nu)
       Apr 10 · Parking           $22.00  (Bradesco)
       ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cross-source (credit card + bank account):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   did my card payment hit the bank account?
Finn:  Yes — Mar 31 bank debit of $4,782.24 matches your Mastercard invoice exactly. ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;A specific period:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You:   how much did I spend on Food in March?
Finn:  🍽️ Food — March 2026
       $312.40 across 9 transactions

       Highest: $89.90 at a restaurant on Mar 22
       Daily average: $10.08
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How I Used OpenClaw
&lt;/h2&gt;

&lt;p&gt;The entire agent is an OpenClaw plugin registered as a &lt;code&gt;before_dispatch&lt;/code&gt; hook. Every WhatsApp message — text, voice, or PDF — passes through Finn before OpenClaw does anything else.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;WhatsApp → OpenClaw gateway → before_dispatch hook → Finn plugin → OpenAI gpt-4.1 → Supabase
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Plugin Registration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// openclaw.plugin.json&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;finance-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;version&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;hooks&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;before_dispatch&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// plugin/src/index.ts&lt;/span&gt;
&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;before_dispatch&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;phone&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;senderId&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;senderId&lt;/span&gt;

  &lt;span class="c1"&gt;// Phone whitelist — only the owner gets responses&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;allowedPhones&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ALLOWED_PHONES&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;,&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;allowedPhones&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;allowedPhones&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;handled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;  &lt;span class="c1"&gt;// silent ignore for unknown numbers&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;phone&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;mediaType&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;handled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;reply&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Agent Loop
&lt;/h3&gt;

&lt;p&gt;The core is a tool-use loop over &lt;code&gt;gpt-4.1&lt;/code&gt; with six tools and a maximum of 5 iterations:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;save_transaction&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Persist a confirmed expense or income entry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;query_spending&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Query totals, breakdowns, history from Supabase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;save_bulk_transactions&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bulk-save confirmed invoice items from a PDF&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;save_bank_statement&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bulk-save confirmed bank statement rows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;update_transaction&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Edit a saved record after confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;delete_transaction&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Delete a record after confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The system prompt enforces a strict state machine: the LLM never calls &lt;code&gt;save_transaction&lt;/code&gt; without an explicit user confirmation. The confirmation always uses a canonical format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$45 · Food · Mastercard · Me · 2026-04-22 — confirm? ✅
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Preventing LLM Hallucination on Queries
&lt;/h3&gt;

&lt;p&gt;Even with a well-crafted prompt saying "always call query_spending", the model would sometimes answer "you spent $X on Transport" by inferring from a recently-parsed PDF in context — instead of querying the database. The fix: force &lt;code&gt;tool_choice&lt;/code&gt; on the first iteration for any spending question:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SPENDING_Q_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/how much|breakdown|what did I spend/i&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;toolChoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;SPENDING_Q_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;function&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;function&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;query_spending&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;auto&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  PDF Invoice Pipeline
&lt;/h3&gt;

&lt;p&gt;When a PDF arrives, the plugin routes it before the LLM ever sees it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pdfText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Scanned/image-based PDF → GPT-4.1 Vision OCR&lt;/span&gt;
  &lt;span class="nx"&gt;invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;parseInvoiceOcr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;pdfToImages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pdfBuffer&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/Extrato de:.*Agência/i&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pdfText&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Bank statement&lt;/span&gt;
  &lt;span class="nx"&gt;stmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseStatementBradesco&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pdfText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Text-based credit card invoice&lt;/span&gt;
  &lt;span class="nx"&gt;invoice&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseInvoice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;pdfText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. PDF parsing is harder than it looks.&lt;/strong&gt; The text extraction from &lt;code&gt;pdf-parse&lt;/code&gt; is reliable for prose but unreliable for table columns — numbers get concatenated with adjacent reference codes. The saldo-diff approach was a counterintuitive fix: instead of parsing the value I want, compute it from context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. "This month" is not a calendar concept for credit cards.&lt;/strong&gt; A purchase on March 8 appears on an April invoice — so a filter of &lt;code&gt;date &amp;gt;= April 1&lt;/code&gt; would miss it. Finn uses a 60-day rolling window for "this month" queries to cover the billing cycle lag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Forcing &lt;code&gt;tool_choice&lt;/code&gt; prevents silent hallucination.&lt;/strong&gt; The model reliably answers from database queries when forced, and sometimes "just knows" from context when not forced. Both answers look correct — the second one just isn't queryable later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. WhatsApp as an interface has a real adoption advantage.&lt;/strong&gt; The friction of opening a dedicated finance app is the #1 reason people stop using them. A chat interface that's already open all day has zero switching cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Security in layers.&lt;/strong&gt; OpenClaw's &lt;code&gt;allowFrom&lt;/code&gt; whitelist blocks at the gateway level. &lt;code&gt;ALLOWED_PHONES&lt;/code&gt; adds an application-level check. Supabase rows are scoped by &lt;code&gt;phone&lt;/code&gt; with RLS. Each layer is independent — if one fails, the others still hold.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Channel&lt;/td&gt;
&lt;td&gt;WhatsApp via OpenClaw&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runtime&lt;/td&gt;
&lt;td&gt;TypeScript, Node.js 20&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Framework&lt;/td&gt;
&lt;td&gt;OpenClaw (&lt;code&gt;before_dispatch&lt;/code&gt; hook)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;OpenAI gpt-4.1 (tool-use loop)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;Supabase (PostgreSQL + Row Level Security)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PDF parsing&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pdf-parse&lt;/code&gt; + custom text parsers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision OCR&lt;/td&gt;
&lt;td&gt;GPT-4.1 Vision (scanned PDFs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audio&lt;/td&gt;
&lt;td&gt;OpenAI Whisper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;pm2 on a VPS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/vicente-r-junior/finn" rel="noopener noreferrer"&gt;github.com/vicente-r-junior/finn&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
    </item>
    <item>
      <title>Notion Life Review OS — Log your day to Notion from WhatsApp using AI</title>
      <dc:creator>Vicente Junior</dc:creator>
      <pubDate>Sun, 29 Mar 2026 12:26:25 +0000</pubDate>
      <link>https://dev.to/vicente_junior_dev/notion-life-review-os-log-your-day-to-notion-from-whatsapp-using-ai-3g3m</link>
      <guid>https://dev.to/vicente_junior_dev/notion-life-review-os-log-your-day-to-notion-from-whatsapp-using-ai-3g3m</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/notion-2026-03-04"&gt;Notion MCP Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Notion Life Review OS&lt;/strong&gt; is a WhatsApp assistant that captures your day and organizes everything in your own Notion workspace — from a single message.&lt;/p&gt;

&lt;p&gt;You send something like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Worked on the API integration today. Need to present to the client next Thursday. Also figured out why our Redis connection was dropping."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It extracts a task, a project, a learning, and your mood. Asks you to confirm. Saves everything to the right Notion database. No forms. No clicking. No friction.&lt;/p&gt;

&lt;p&gt;The core idea is simple: your day lives in WhatsApp already. You're already typing there. So why open another tool?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It also works the other way.&lt;/strong&gt; Ask it anything:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"What tasks are due this week?"&lt;br&gt;
"What did I learn this week?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And you can manage your Notion schema directly from WhatsApp — even via voice:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Add a column called Who, select type, to the Tasks table"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The new field is available on the very next message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One thing I really liked about how this came together:&lt;/strong&gt; the project and task structure is completely generic. You can use it for work — a project called "API Backend" with tasks like "Deploy to production". But it works just as well for a grocery list — project "Supermarket", tasks "milk, eggs, bread". Or a personal to-do list. The system doesn't care. It just captures what you tell it and puts it in the right place.&lt;/p&gt;

&lt;h2&gt;
  
  
  Video Demo
&lt;/h2&gt;

&lt;p&gt;&lt;iframe src="https://player.mux.com/Tkqz82m9uzGSX01TCH2awdaxibvQtAjBQ6p5DvuwpHGg" width="710" height="399"&gt;
&lt;/iframe&gt;

&lt;/p&gt;

&lt;h2&gt;
  
  
  Show me the code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/vicente-r-junior/notion-life-review-os" rel="noopener noreferrer"&gt;github.com/vicente-r-junior/notion-life-review-os&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Full setup instructions in the README.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Notion MCP
&lt;/h2&gt;

&lt;p&gt;Notion MCP is the backbone of the entire system. Every single interaction with Notion goes through it — no direct API calls anywhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reading schema at startup&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the app boots, it calls &lt;code&gt;API-retrieve-a-database&lt;/code&gt; and &lt;code&gt;API-retrieve-a-data-source&lt;/code&gt; for each of the 5 databases. The schemas get cached in Redis and injected directly into the GPT-4o system prompt — so the agent knows what fields exist, what types they are, and which ones are required, without any extra calls per message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the user confirms, the app calls &lt;code&gt;API-post-page&lt;/code&gt; for each item — daily log, tasks, projects, learnings. This part is pure deterministic Python, not an LLM. The write step is too important to leave non-deterministic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Querying data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For questions like "what tasks are due this week?", the agent uses &lt;code&gt;API-query-data-source&lt;/code&gt; with structured filters built from natural language. It resolves dates, applies status filters, and formats the answer for WhatsApp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Updating schema dynamically&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the user asks to add a column — even via voice — the app calls &lt;code&gt;API-update-a-data-source&lt;/code&gt;. The Redis cache refreshes immediately and the system prompt is rebuilt. The new field is available on the next message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bulk updates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For things like "set all tasks Who to Vicente", the app queries first, shows a confirmation with the affected records, then calls &lt;code&gt;API-patch-page&lt;/code&gt; for each one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WhatsApp → Evolution API → FastAPI webhook
                               ↓
                     Intent classifier (GPT-4o-mini)
                               ↓
           ┌───────────────────┼──────────────────┐
           ↓                   ↓                  ↓
 Conversational agent     Query agent      Add column flow
      (GPT-4o)             (GPT-4o)        (GPT-4o-mini)
           ↓                   ↓                  ↓
      SAVE_PAYLOAD        Notion MCP         Notion MCP
           ↓               (query)          (update schema)
    User confirms
           ↓
     Notion Writer
     (pure Python)
           ↓
       Notion MCP
      (write pages)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;One conversational agent instead of a pipeline.&lt;/strong&gt;&lt;br&gt;
I started with separate extractor, matcher, and confirmation agents. It was complex and fragile. A single GPT-4o call with Redis conversation history turned out to be simpler, faster, and much easier to debug. The agent holds the full context of the conversation and knows when it has enough information to produce a &lt;code&gt;SAVE_PAYLOAD&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The write step is never an LLM.&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;notion_writer&lt;/code&gt; is pure Python calling Notion MCP directly. Every property format handled explicitly. Giving an LLM direct write access to your Notion is asking for trouble.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema-aware prompts.&lt;/strong&gt;&lt;br&gt;
The agent knows your exact Notion schema at all times. Custom fields like &lt;em&gt;Who&lt;/em&gt;, &lt;em&gt;Priority&lt;/em&gt;, or &lt;em&gt;Energy&lt;/em&gt; are injected into the system prompt dynamically. If a field is marked required, the agent asks for it before saving — no partial records.&lt;/p&gt;
&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Python 3.12 + FastAPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI&lt;/td&gt;
&lt;td&gt;OpenAI GPT-4o + Whisper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Notion interface&lt;/td&gt;
&lt;td&gt;Notion MCP (&lt;code&gt;mcp/notion&lt;/code&gt; Docker image)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WhatsApp bridge&lt;/td&gt;
&lt;td&gt;Evolution API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Session + cache&lt;/td&gt;
&lt;td&gt;Redis 7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Docker Compose on Hostinger VPS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Clone and configure&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/vicente-r-junior/notion-life-review-os.git
&lt;span class="nb"&gt;cd &lt;/span&gt;notion-life-review-os
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Create 5 Notion databases&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Inside a parent page called &lt;strong&gt;Life Review OS&lt;/strong&gt;, create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily Logs&lt;/li&gt;
&lt;li&gt;Tasks&lt;/li&gt;
&lt;li&gt;Projects&lt;/li&gt;
&lt;li&gt;Learnings&lt;/li&gt;
&lt;li&gt;Weekly Reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Copy each database ID into &lt;code&gt;.env&lt;/code&gt;. Connect your Notion integration to the parent page — it propagates to all children automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Configure &lt;code&gt;.env&lt;/code&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENAI_API_KEY=sk-...
NOTION_API_KEY=secret_...
NOTION_DB_DAILY_LOGS=...
NOTION_DB_TASKS=...
NOTION_DB_PROJECTS=...
NOTION_DB_LEARNINGS=...
NOTION_DB_WEEKLY_REPORTS=...
MCP_AUTH_TOKEN=any-random-string
EVOLUTION_API_URL=http://your-evolution-api:8080
EVOLUTION_API_KEY=...
EVOLUTION_INSTANCE=your-instance-name
WHATSAPP_NUMBER=5511999999999
REDIS_URL=redis://app-redis:6379
TIMEZONE=America/Sao_Paulo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;4. Start&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Point your Evolution API webhook to &lt;code&gt;http://your-server:8000/webhook&lt;/code&gt; and you're live.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MCP response parsing trips you up the first time.&lt;/strong&gt; Every Notion MCP response is SSE-wrapped JSON inside a &lt;code&gt;content&lt;/code&gt; array. Once you have the unwrapping pattern it's trivial — but it's not obvious when you first hit it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One agent beats a pipeline.&lt;/strong&gt; I built the multi-agent version first. Extractor, matcher, confirmation, writer — each doing one thing. It looked clean on paper and was a nightmare in practice. Replacing it with a single conversational GPT-4o call and Redis history was the best decision I made on this project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The write step should never be an LLM.&lt;/strong&gt; Flexible conversation on the way in, deterministic code on the way out. That's the pattern that worked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt design is the real work.&lt;/strong&gt; Getting the agent to always include &lt;code&gt;SAVE_PAYLOAD&lt;/code&gt; when there's actionable content, never say "done" without confirming, correctly handle corrections mid-conversation — that's where most of the iteration went. The code was the easy part.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redis for everything.&lt;/strong&gt; Session state, schema cache, idempotency keys, conversation history — all in Redis with TTLs. No separate database needed. Cleanup is automatic.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>notionchallenge</category>
      <category>mcp</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
