<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Saulo Linares</title>
    <description>The latest articles on DEV Community by Saulo Linares (@saulolinares10).</description>
    <link>https://dev.to/saulolinares10</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3929476%2F09e99a6c-4a14-41fa-b3ba-95da4f3c31f9.jpeg</url>
      <title>DEV Community: Saulo Linares</title>
      <link>https://dev.to/saulolinares10</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saulolinares10"/>
    <language>en</language>
    <item>
      <title>The most important AI use case isn't in Silicon Valley</title>
      <dc:creator>Saulo Linares</dc:creator>
      <pubDate>Fri, 29 May 2026 00:08:35 +0000</pubDate>
      <link>https://dev.to/saulolinares10/the-most-important-ai-use-case-isnt-in-silicon-valley-4mce</link>
      <guid>https://dev.to/saulolinares10/the-most-important-ai-use-case-isnt-in-silicon-valley-4mce</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;: Write About Hermes Agent&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The informal economy accounts for more than 60% of employment in Latin America. In Venezuela, the number is higher — economists stopped agreeing on exactly how high after the formal economy contracted by 80% between 2013 and 2021. Most of that activity is not tracked in software. It's tracked in notebooks, in memory, in WhatsApp voice messages sent to a cousin who writes things down. The people running these businesses are not less intelligent than the people who use enterprise software. They just operate in an environment that enterprise software was not designed to reach...&lt;/p&gt;

&lt;p&gt;I'm Venezuelan. I know this not from a report but from El Cafetal — from the abastos that stayed open through hyperinflation and blackouts, from a family that ran small businesses with worn notebooks and sharp memories, from watching a commercial ecosystem survive things that no business school case study would describe as survivable. When I started building on Claude's API, that background didn't leave me. It kept raising the same question: who is this for, and who does it never reach?&lt;/p&gt;

&lt;p&gt;The last five years of AI progress have produced extraordinary tools. Almost none of them are accessible to the majority of the world's workers. Not because the technology couldn't help them — because the design assumptions exclude them before they even try. You need an account. You need a credit card, or at least a bank. You need to know what a "workspace" is, what an "integration" is, what "onboarding" means. You need English, or enough of it to navigate a settings screen. You need the time and patience to learn a new interface when the old one — the notebook, the memory — already works.&lt;/p&gt;

&lt;p&gt;These aren't small barriers. They are the product. When a team builds an AI tool and asks "how do we get users to set up their account," they have already decided who their users are. The person doing inventory in a Caracas abasto with a pre-paid phone plan is not in that conversation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fki7ik77j4a8i7d7p5z45.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fki7ik77j4a8i7d7p5z45.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;WhatsApp is a different kind of infrastructure. Two billion people use it. In Latin America, Africa, and Southeast Asia it is not a messaging app — it is the application layer of daily economic life. It is how you send money, how you confirm a delivery, how you coordinate with the supplier who doesn't have email. Any AI system that wants to reach these users has to live there, inside that interface, in that language, without asking anything new of the user except to send a message the way they already send messages. That's the design constraint that matters.&lt;/p&gt;




&lt;p&gt;There's a version of this argument that leads to chatbots, and I want to be specific about why chatbots aren't the answer. A chatbot forgets. Every conversation starts from zero. You tell it you sell harina de maíz, café, aceite, and refrescos. Next time you open the chat, it has no idea. That is not useful to someone running a real business. That's a search engine with worse latency.&lt;/p&gt;

&lt;p&gt;An agent is different in one specific way: it builds a model of your context over time. After 30 days, it knows that Fridays are high volume. It knows you restock flour on Tuesdays. It knows your best margin is on coffee, that the Polar supplier is reliable, that the Caribe supplier has been inconsistent. None of that was programmed in as rules. It accumulated from the interactions themselves — from inventory updates and sale logs and weekly summaries, each one adding a layer to a picture of this specific business.&lt;/p&gt;

&lt;p&gt;This is what Hermes Agent's episodic memory and skill accumulation actually mean in practice — not a technical feature, but the difference between a toy and a tool. The skill loop isn't about making the agent smarter in the abstract. It's about making it smarter about &lt;em&gt;this&lt;/em&gt; business, &lt;em&gt;this&lt;/em&gt; owner, &lt;em&gt;this&lt;/em&gt; set of products that have specific names in Venezuelan informal speech that no generic NLP pipeline handles well.&lt;/p&gt;

&lt;p&gt;"Me cayó un bulto de harina" and "llegaron 50 kg de harina" are the same inventory update. A rule-based system catches one of them. Claude catches both, and the fifty other ways someone might say the same thing, because it understands the sentence rather than matching a pattern. The accumulation of that context — stored in Hermes's persistent memory, refined by the skill system every 15 interactions — is what turns a transaction-processing bot into something closer to a partner.&lt;/p&gt;

&lt;p&gt;Not a partner with opinions. Just something that holds the history so the owner doesn't have to, and surfaces patterns at the right moment without being asked.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Es viernes — suele entrar más harina los viernes."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzkjmg3vkr28kioxep2g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuzkjmg3vkr28kioxep2g.png" alt=" " width="708" height="249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That observation, generated from four weeks of skill records, is not impressive as a demo. It is useful as a business tool. Those are not the same thing, and most demos optimize for the former.&lt;/p&gt;




&lt;p&gt;Building Vecino — a Hermes Agent-powered WhatsApp assistant for Latin American small businesses — clarified something I had understood abstractly but not concretely. The architecture that works for an abasto in El Cafetal is not a specialized, stripped-down version of enterprise software. It is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persistent memory across sessions (Hermes MEMORY.md + FTS5 recall)&lt;/li&gt;
&lt;li&gt;Proactive scheduling (Hermes native cron, no custom infrastructure)&lt;/li&gt;
&lt;li&gt;A familiar interface (WhatsApp, native Hermes messaging gateway)&lt;/li&gt;
&lt;li&gt;A language model that handles informal speech in the language the user actually speaks (Claude, prompt written in Spanish, not translated)&lt;/li&gt;
&lt;li&gt;Event hooks that act without being asked (low_stock_alert fires on every inventory write)&lt;/li&gt;
&lt;li&gt;Subagent delegation so the main agent stays responsive while summaries are being formatted&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://www.youtube.com/watch?v=3uCOtGmXepw" rel="noopener noreferrer"&gt;https://www.youtube.com/watch?v=3uCOtGmXepw&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That architecture is not a compromise. It is better, for most businesses, than the dashboard-and-integration stack that enterprise software sells. Most of the world's businesses don't need a dashboard. They need something that shows up at 9pm with a summary of the day and remembers what you told it three weeks ago.&lt;/p&gt;




&lt;p&gt;The question I keep coming back to: what would the AI product landscape look like if the default design assumption was a WhatsApp number instead of an email address? Not as an edge case, not as a "localization" effort — as the primary interface.&lt;/p&gt;

&lt;p&gt;How many of the products being built right now would be designed differently? How many would be more useful to more people?&lt;/p&gt;

&lt;p&gt;The answer is most of them. Email as the identity layer and the dashboard as the primary interface are not universal truths about how software should work. They are decisions made by people who have email addresses and use dashboards, building for other people who have email addresses and use dashboards.&lt;/p&gt;

&lt;p&gt;I don't know if Vecino will find its way to El Cafetal. Distribution is a harder problem than architecture. But the architecture exists now. The repo is open. And if someone in Lagos or Jakarta or Medellín looks at this and thinks: &lt;em&gt;I know the version of this that works for my informal economy, my language, my products&lt;/em&gt; — every piece is there.&lt;/p&gt;

&lt;p&gt;The most important AI use case isn't in San Francisco. It's wherever someone is doing the math in their head because no system ever bothered to reach them.&lt;/p&gt;

&lt;p&gt;We have the technology to change that. The question is whether we bother to design for it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Saulo Linares · Born in Caracas · Building in Bogotá, Colombia&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://www.linkedin.com/in/saulolinares/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; · &lt;a href="https://github.com/saulolinares10/vecino" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
    </item>
    <item>
      <title>How Hermes AI Agent can help corner "kioskos" in Caracas, Venezuela</title>
      <dc:creator>Saulo Linares</dc:creator>
      <pubDate>Fri, 29 May 2026 00:00:11 +0000</pubDate>
      <link>https://dev.to/saulolinares10/how-hermes-ai-agent-can-help-corner-kioskos-in-caracas-venezuela-1lh1</link>
      <guid>https://dev.to/saulolinares10/how-hermes-ai-agent-can-help-corner-kioskos-in-caracas-venezuela-1lh1</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;: Build With Hermes Agent&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;There is a kiosko on Avenida El Limón in El Cafetal, Caracas called Wuilander. I walked past it hundreds of times growing up. It has 134,000 Instagram followers — more than most startups — because it became a symbol of something: a Venezuelan small business that survived everything and kept going.&lt;/p&gt;

&lt;p&gt;El Cafetal it's a middle-class neighborhood in eastern Caracas that lived through what Venezuela lived through — hyperinflation that peaked at 130,000% in a single year, rolling blackouts, four currency conversions in a decade, 7 million people leaving the country including, eventually, me.&lt;/p&gt;

&lt;p&gt;The businesses that stayed open did so on memory, on trust, and on WhatsApp. Not on software. Not on systems. There was no Shopify for this. No QuickBooks. No Stripe. The tools that exist assume things that aren't true in the informal Latin American economy: a credit card, a stable email address, reliable electricity, English literacy, time to learn something new.&lt;/p&gt;

&lt;p&gt;I'm a Data Lead, working in PE consulting in Colombia, building on Claude's API in my spare time... But I kept thinking about Wuilander. About what it would look like if the default design assumption for an AI tool was a WhatsApp number instead of an email address....&lt;/p&gt;

&lt;p&gt;So I built &lt;strong&gt;Vecino&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqg0iqvq0w05barmmd0yv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqg0iqvq0w05barmmd0yv.png" alt=" " width="799" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Vecino is a Hermes Agent-powered WhatsApp business assistant for Latin American small businesses. No app to download. No dashboard to learn. No onboarding. You talk to it the way you already talk to everyone — on WhatsApp, in Spanish, in the informal register of someone who grew up in El Cafetal.&lt;/p&gt;

&lt;p&gt;You tell it what arrived:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"llegaron 48 bolsas de café"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You ask what you have:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"cuánto tengo de arroz"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You log a sale:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"vendí 5 aceites"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every night at 9pm, without being asked, it sends you a summary of the day. Every Monday at 8am, the week's P&amp;amp;L. After 30 days, it knows your business patterns better than the notebook did.&lt;/p&gt;

&lt;p&gt;It's named Vecino — the neighbor — because that's what it is. The one who knows your business, shows up every day, never forgets.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;The demo below shows Vecino in action: a pixel-perfect WhatsApp simulation on the left, and the Hermes Agent execution layer on the right — showing in real time how each message flows through the agent: intent parsing, memory reads and writes, skill loading, event hooks firing, and the scheduled 9pm summary arriving automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0ozjvgv7gj0lkb9465m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0ozjvgv7gj0lkb9465m.png" alt=" " width="800" height="392"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🎥 [Video walkthrough — watch the conversation play out and the Hermes execution trace update in real time]&lt;br&gt;
&lt;a href="https://youtu.be/3uCOtGmXepw" rel="noopener noreferrer"&gt;https://youtu.be/3uCOtGmXepw&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://github.com/saulolinares10/vecino" rel="noopener noreferrer"&gt;Interactive demo&lt;/a&gt; · &lt;a href="https://github.com/saulolinares10/vecino" rel="noopener noreferrer"&gt;github.com/saulolinares10/vecino&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Three moments worth watching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"llegaron 48 bolsas de café"&lt;/strong&gt; → watch MEMORY write and SKILL load appear in the execution trace within the same second&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"vendí 5 aceites"&lt;/strong&gt; → watch EVENT HOOKS fire &lt;code&gt;low_stock_alert&lt;/code&gt; automatically — the owner never asked for the warning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;9:00 PM&lt;/strong&gt; → CRON job triggers, SUBAGENT spawns to format the summary, the message arrives without any user input&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/saulolinares10/vecino" rel="noopener noreferrer"&gt;github.com/saulolinares10/vecino&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SOUL.md — the agent's personality file&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Vecino — Identidad del Agente&lt;/span&gt;

Eres Vecino. Un asistente de negocios para abastos y tiendas pequeñas
en América Latina. Vives en WhatsApp.

&lt;span class="gu"&gt;## Personalidad&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Cálido, directo, como un vecino de confianza — no un sistema
&lt;span class="p"&gt;-&lt;/span&gt; Nunca dices "procesando su solicitud" ni "entendido, procederé"
&lt;span class="p"&gt;-&lt;/span&gt; Dices "listo", "anotado", "ojo con esto", "que descanses"
&lt;span class="p"&gt;-&lt;/span&gt; Usas emojis con moderación: ☕ 🌾 ⚠️ 🌙 🙌
&lt;span class="p"&gt;-&lt;/span&gt; Respondes corto — esto es WhatsApp, no un correo
&lt;span class="p"&gt;-&lt;/span&gt; Cuando el inventario está bajo, lo mencionas sin que te pregunten

&lt;span class="gu"&gt;## Idioma&lt;/span&gt;
Siempre en español. Registro informal venezolano/latinoamericano.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes loads &lt;code&gt;SOUL.md&lt;/code&gt; as a context file automatically — it shapes every response the agent generates. This is how you give an AI agent a personality that fits a specific cultural context without fine-tuning a model. The line "Nunca dices 'procesando su solicitud'" is doing real work: it's the difference between a chatbot and a neighbor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intent parsing in Spanish — agent/nlp.py&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;INTENT_SYSTEM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s"&gt;Eres el cerebro de Vecino, un agente de inteligencia de negocios para pequeños comerciantes latinoamericanos.
Tu trabajo es analizar mensajes de WhatsApp en español e identificar la intención del negociante y las entidades relevantes.

Responde SIEMPRE con JSON válido. Sin explicaciones. Sin texto extra. Solo el JSON.

Intenciones:
STOCK_IN    — &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llegaron 50 cajas de harina&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;entró un bulto de arroz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;recibí 200 unidades de aceite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
SALE_LOG    — &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;vendí 5 bolsas de café&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;salieron 12 unidades de refresco hoy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
STOCK_QUERY — &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuánto tengo de arroz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;qué me queda de aceite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
SUMMARY_REQUEST — &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;resumen del día&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cómo voy hoy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dame el cierre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;
UNKNOWN     — no se puede determinar la intención
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Claude is not optional here. "llegaron", "me trajeron", "recibí", "entró mercancía" all mean the same thing — and a Venezuelan shopkeeper will use all four in the same week. Keyword matching cannot handle this. Claude parses intent reliably across all of them because it understands context, not just surface patterns. The prompt is written in Spanish, not English translated into Spanish — that distinction matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event hook for low-stock alerts — agent/hooks/low_stock_alert.py&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent.on&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;after:log_sale&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;after_sale&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;_check_and_alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_check_and_alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updated_product&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;low_items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_low_stock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LOW_STOCK_THRESHOLD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;low_items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="n"&gt;new_alerts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;low_items&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;_already_alerted&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;new_alerts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;
    &lt;span class="n"&gt;alert_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_format_alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_alerts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;twilio_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OWNER_PHONE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;alert_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The owner never asked for this warning. They just said "vendí 5 aceites." The agent watched, noticed the remaining stock crossed the threshold, and acted. That is the difference between a tool that waits and an agent that watches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scheduled tasks — cron.yaml&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vecino-daily-summary&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;21&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*"&lt;/span&gt;
    &lt;span class="na"&gt;skill&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vecino-summary&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Genera&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;el&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;resumen&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;del&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;día&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;español&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;envíalo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;por&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;WhatsApp"&lt;/span&gt;
    &lt;span class="na"&gt;deliver_to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;whatsapp&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vecino-weekly-pl&lt;/span&gt;
    &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;8&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;
    &lt;span class="na"&gt;skill&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vecino-summary&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Genera&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;el&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;resumen&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;semanal&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;con&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;P&amp;amp;L&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;español&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;envíalo&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;por&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;WhatsApp"&lt;/span&gt;
    &lt;span class="na"&gt;deliver_to&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;whatsapp&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One constraint worth noting: Hermes does not allow scheduled tasks to spawn new scheduled tasks. A cron job can spawn a subagent. That subagent cannot register a new cron job. This is a deliberate safety constraint — automation that can replicate its own scheduling is automation you can't trust.&lt;/p&gt;




&lt;h3&gt;
  
  
  My Tech Stack
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Hermes Agent v0.14.0&lt;/td&gt;
&lt;td&gt;Messaging gateway, persistent memory, skill system, scheduled tasks, event hooks, subagent delegation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude API (claude-sonnet-4)&lt;/td&gt;
&lt;td&gt;Intent parsing in Spanish, response generation, summary formatting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FastAPI + Python 3.11&lt;/td&gt;
&lt;td&gt;Operator dashboard API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQLite&lt;/td&gt;
&lt;td&gt;Local inventory and sales persistence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;React + Vite&lt;/td&gt;
&lt;td&gt;Operator dashboard frontend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Railway&lt;/td&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How I Used Hermes Agent
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Messaging Gateway — WhatsApp as first-class interface
&lt;/h3&gt;

&lt;p&gt;Hermes treats WhatsApp as a native platform, not a webhook integration. The agent lives in the conversation. Sessions persist across messages. The owner doesn't re-explain their inventory every time — Hermes maintains conversational state natively.&lt;/p&gt;

&lt;p&gt;Wuilander's owner isn't going to install an app. They're not going to visit a dashboard. They're going to send a WhatsApp message the same way they send one to their daughter. Without a gateway that handles session persistence natively, every message exchange would require the developer to reconstruct context from scratch. Hermes does this without any custom infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Persistent Memory + FTS5 Cross-Session Recall
&lt;/h3&gt;

&lt;p&gt;When the owner asks "cuánto tengo de arroz" three weeks after last updating their rice inventory, Vecino finds it — not because of a database query the developer wrote, but because Hermes's FTS5 recall searches across all prior sessions automatically.&lt;/p&gt;

&lt;p&gt;A neighbor remembers. That's the entire premise. A stateless API call can be impressive in a demo and useless after day three. The persistent memory is what makes Vecino useful after day one instead of just impressive during a demonstration.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Skill System + Self-Improvement Loop
&lt;/h3&gt;

&lt;p&gt;Every 15 interactions, Hermes pauses, examines what it learned, and writes or rewrites a skill file. After 30 days of Vecino running for a specific business, the skill file contains patterns the developer never wrote: peak hours, top products, restock frequency, seasonal patterns.&lt;/p&gt;

&lt;p&gt;The skill file for Wuilander after 30 days might read: &lt;em&gt;"Fridays: high rotation on harina de maíz. Restock Tuesdays. Coffee margin highest. Owner messages peak 11am–1pm."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;No developer wrote that. The agent learned it.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Scheduled Tasks + Subagent Delegation
&lt;/h3&gt;

&lt;p&gt;The 9pm daily summary and Monday P&amp;amp;L are native Hermes cron jobs. They run without the owner asking. They delegate formatting to a child subagent — keeping the main agent responsive to incoming messages during the formatting step.&lt;/p&gt;

&lt;p&gt;This is the moment in the demo that makes people understand what an agent actually is. The owner didn't ask. The message arrived. That's not a feature. That's a different relationship between software and the people it serves.&lt;/p&gt;




&lt;p&gt;Venezuela has 7.7 million people living outside the country. That's one of the largest displacement crises in the Western Hemisphere. Most of us have someone back home — a parent, a cousin, a neighbor — running a small business, keeping a family fed, doing the math in their head because no system ever bothered to reach them.&lt;/p&gt;

&lt;p&gt;Vecino is one attempt to close that gap. One abasto. One WhatsApp number. One neighbor who never forgets.&lt;/p&gt;




&lt;h2&gt;
  
  
  This is just the beginning — and it could be yours too
&lt;/h2&gt;

&lt;p&gt;Vecino started as a challenge submission. But the more I built it, the more I realized: &lt;strong&gt;this is a real startup idea.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every country in Latin America has its version of Wuilander. So does West Africa. Southeast Asia. Any place where WhatsApp is infrastructure and enterprise software is a foreign concept. The architecture is the same. The language changes. The cultural register changes. The need doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Should I build this for real? Should we?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The repo is open. The architecture works. If you're in Colombia, México, Perú, Nigeria, Indonesia — anywhere with a WhatsApp-first informal economy — and you want to build the version for your community, in your language:&lt;/p&gt;

&lt;p&gt;Come find me. You are welcome here.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Saulo Linares · Born in Caracas · Building in Bogotá, Colombia&lt;/em&gt;&lt;br&gt;
&lt;em&gt;&lt;a href="https://www.linkedin.com/in/saulolinares/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; · &lt;a href="https://github.com/saulolinares10" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>I built a financial AI agent and watched vector search miss the two most relevant positions in the portfolio</title>
      <dc:creator>Saulo Linares</dc:creator>
      <pubDate>Thu, 21 May 2026 04:10:07 +0000</pubDate>
      <link>https://dev.to/saulolinares10/i-rebuilt-my-financial-mentor-retrieval-from-scratch-heres-everything-the-rag-stack-taught-me-56cg</link>
      <guid>https://dev.to/saulolinares10/i-rebuilt-my-financial-mentor-retrieval-from-scratch-heres-everything-the-rag-stack-taught-me-56cg</guid>
      <description>&lt;p&gt;I built a financial AI agent to analyze portfolio positions and answer questions about market exposure. The retrieval system worked fine on simple queries. Then I asked it something relational.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"How does Fed policy affect tech positions?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The system retrieved a P&amp;amp;L summary with a cosine similarity score of 0.237. AAPL came back at 0.031. MSFT at 0.018. Both below the retrieval threshold. The two most relevant positions in the portfolio — near misses.&lt;/p&gt;

&lt;p&gt;I spent time checking chunking strategy, embedding quality, query formulation. All reasonable things to check. But the deeper issue was different: similarity search was solving the wrong problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The query depended on a causal chain:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fed policy → rate hikes → discount rates → growth stock duration → tech valuation sensitivity → portfolio positions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;None of those relationships appear as similar text in any document. They exist as connections between entities — not as proximity in embedding space.&lt;/p&gt;

&lt;p&gt;That distinction is the whole lesson&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What vector search actually optimizes for&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Embedding models compress meaning into vectors. They are very good at finding text that is semantically related to a query. "Fed policy" and "interest rates" will be geometrically close. "Tech valuations" and "growth stocks" will cluster together.&lt;/p&gt;

&lt;p&gt;What they cannot encode is directionality. "Fed policy affects interest rates" and "interest rates affect Fed policy" produce similar embeddings. The causal arrow is invisible to the model.&lt;/p&gt;

&lt;p&gt;To be precise: the low scores on AAPL and MSFT likely reflect a combination of chunking quality and query formulation — not a categorical failure of vector search. A better-engineered pipeline would do better. But even a well-tuned vector index has no native concept of "affects" or "belongs_to." That gap is structural, not a tuning problem&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdld5lvvgh11jf9yisrq0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdld5lvvgh11jf9yisrq0.png" alt=" " width="800" height="818"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What the knowledge graph found&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I added a graph layer. Claude ran entity and relationship extraction on 6 document chunks and produced 36 entities and 39 relationships. Nothing was hand-authored. The extraction prompt asked for entity types (company, sector, metric, event, concept) and relationship types (affects, belongs_to, sensitive_to, reported_by).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The traversal on the same query:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Federal Reserve → affects → Rate hike → affects → Discount rate → affects → Tech valuations → sensitive_to → AAPL, MSFT&lt;/p&gt;

&lt;p&gt;The chain assembled itself from the graph structure. No document contained the sentence "Fed policy affects your tech positions." But the extracted relationships between entities did contain that information — just not as text similarity.&lt;/p&gt;

&lt;p&gt;Knowledge graph: 36 entities, 39 relationships, traversal path highlighted in teal&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3mymtdj2fvne2elsotl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk3mymtdj2fvne2elsotl.png" alt=" " width="800" height="735"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is a proof of concept on 6 chunks, not a production system. Production GraphRAG requires entity disambiguation, ontology validation, and handling extraction errors at scale. The concept is real. The engineering cost is significant&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When graphs are overkill&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;GraphRAG is not always the right answer. For a corpus of independent FAQ articles with no meaningful entity relationships, graph extraction adds cost, query latency, and maintenance overhead with no retrieval benefit. Standard hybrid RAG — BM25 plus semantic search merged with reciprocal rank fusion — handles that case better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The decision rule:&lt;/strong&gt; use GraphRAG when relationships between entities matter as much as document content. Use hybrid RAG when they do not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The refusal that mattered more than the retrieval&lt;/strong&gt;&lt;br&gt;
After fixing the relational query problem, I tested the opposite case. I asked about a stock that was not in the dataset at all.&lt;/p&gt;

&lt;p&gt;This is where most RAG systems quietly fail. Retrieval returns whatever is closest — even if nothing is actually relevant — and the model generates a confident answer from weak context.&lt;/p&gt;

&lt;p&gt;I added CRAG: Corrective RAG. The system scores its own retrieval quality before generating. If the maximum relevance score falls below a threshold, it does not generate. It declines instead.&lt;/p&gt;

&lt;p&gt;Max retrieval confidence on the out-of-dataset query: 0.10.&lt;/p&gt;

&lt;p&gt;The system responded: "I don't have reliable information about this in my knowledge base. Please consult a qualified financial advisor."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwy5cb9iwu77e9onnk8ov.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwy5cb9iwu77e9onnk8ov.png" alt=" " width="799" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;CRAG confidence scoring: three scenarios — high confidence answer, partial answer with caveat, low confidence refusal&lt;/p&gt;

&lt;p&gt;The refusal behavior itself is not impressive. Any system with a low enough threshold will refuse. What matters is the mechanism: the self-assessment loop runs between retrieval and generation, not after. The system decides whether to generate before it generates.&lt;/p&gt;

&lt;p&gt;For a financial AI that ordering is important. A confident hallucination about a portfolio position is a different category of failure than a retrieval miss&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The diagnostic framework I use now&lt;/strong&gt;&lt;br&gt;
When retrieval fails, the metric combination tells you where to look:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Faithfulness&lt;/th&gt;
&lt;th&gt;Context recall&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Fix retrieval — generation is fine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Fix generation — retrieval is fine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Both low&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;Fix retrieval first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Working correctly&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Faithfulness measures whether the answer came from retrieved context. Context recall measures whether retrieval surfaced the right chunks. A system can score high faithfulness on the wrong retrieved context — which is why both metrics are needed.&lt;/p&gt;

&lt;p&gt;One thing I changed after running evals: I stopped writing test queries myself. Author-written queries use vocabulary that matches the index. Real user queries do not. The gap between those two populations is where most retrieval failures live.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I would do differently&lt;/strong&gt;&lt;br&gt;
Three things that were not obvious at the start:&lt;/p&gt;

&lt;p&gt;Real-time financial data should not be in a vector index. Indexed prices go stale the moment the market moves. Pull fresh from the data source at query time for any price-sensitive question. Use the index only for slow-changing data: analyst reports, historical transactions, reference documents.&lt;/p&gt;

&lt;p&gt;Test on queries you did not write. Use an LLM to generate casual paraphrases of your formal test questions. "What is my AAPL position" and "how am I doing with apple stock" should retrieve the same thing. Often they do not.&lt;/p&gt;

&lt;p&gt;Adversarial cases are not optional. A golden dataset without questions the system cannot answer will not catch the failure mode that matters most. For a financial AI, incorrect confident answers are a different category of problem than incorrect uncertain answers.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>claude</category>
      <category>python</category>
    </item>
    <item>
      <title>I was paying 2x too much for Claude API calls...</title>
      <dc:creator>Saulo Linares</dc:creator>
      <pubDate>Thu, 14 May 2026 03:56:23 +0000</pubDate>
      <link>https://dev.to/saulolinares10/i-was-paying-3x-too-much-for-claude-api-calls-18jj</link>
      <guid>https://dev.to/saulolinares10/i-was-paying-3x-too-much-for-claude-api-calls-18jj</guid>
      <description>&lt;p&gt;I was three weeks into building an Agent for my work (a productivity helper for data analysts) when I noticed certain flows were costing noticeably more than others. I assumed it was response length — longer answers, more output tokens, higher bill. So I added a system prompt instruction to be concise, watched the costs barely move, and moved on.&lt;/p&gt;

&lt;p&gt;Two weeks later I finally token-counted the inputs. The problem wasn't the output. The problem was me passing raw JSON data as context on every single request. The same information serialized as plain prose used 60% fewer tokens. I had been paying a 2.5x markup on every API call that touched the data — for weeks — because I never checked what I was actually sending.&lt;/p&gt;

&lt;p&gt;That sent me back to the transformer paper. Not to feel bad about the cost, but to understand &lt;em&gt;why&lt;/em&gt; this happens at an architectural level. What I found turned several things I treated as configuration choices into things I now understand as architectural requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why JSON costs more than prose
&lt;/h2&gt;

&lt;p&gt;The model never sees your text. It sees tokens — integer IDs produced by Byte-Pair Encoding (BPE). BPE builds a vocabulary of subword units by iteratively merging frequent character pairs in the training corpus. Plain English prose compresses well: common words and subwords get their own tokens, so a typical sentence runs around 4–5 characters per token.&lt;/p&gt;

&lt;p&gt;JSON doesn't compress the same way. Every structural character — &lt;code&gt;{&lt;/code&gt;, &lt;code&gt;}&lt;/code&gt;, &lt;code&gt;"&lt;/code&gt;, &lt;code&gt;:&lt;/code&gt;, &lt;code&gt;,&lt;/code&gt; — is a potential token boundary. For example, in my FinMentor Multi Agent Architecture a key-value pair like &lt;code&gt;"ticker": "AAPL"&lt;/code&gt; tokenizes to roughly 8 tokens. The prose equivalent — "AAPL" — is 1. I ran both through tiktoken (OpenAI's BPE tokenizer, same approach as Claude) on equivalent portfolio payloads. The JSON used 2.6x the tokens.&lt;/p&gt;

&lt;p&gt;The practical fix is simple: serialize to prose where you can, and compact JSON where you can't. Remove whitespace, use short key names, avoid redundant nesting. The model doesn't need your JSON to be human-readable — it needs it to be short.&lt;/p&gt;

&lt;p&gt;The first thing to check when a client says "our API costs are too high" is not the system prompt length or the response verbosity. It's what format their data is arriving in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing attention from scratch
&lt;/h2&gt;

&lt;p&gt;I wanted to see the math directly, so I implemented scaled dot-product attention in pure NumPy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;scaled_dot_product_attention&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Q&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;d_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Q&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;K&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;
    &lt;span class="n"&gt;scaled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sqrt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;d_k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;softmax&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scaled&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;V&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;weights&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The formula is &lt;code&gt;softmax(QK^T / sqrt(d_k)) @ V&lt;/code&gt;. Each token has three vectors: a Query (what it's looking for), a Key (what it offers), and a Value (what information it passes forward). The dot product of a query against all keys gives raw attention scores — how relevant is each other token to this one. Softmax converts those scores to a probability distribution. The weighted sum of values is the output.&lt;/p&gt;

&lt;p&gt;The scaling factor &lt;code&gt;sqrt(d_k)&lt;/code&gt; is the part that's easy to skip over and wrong to skip. Without it, raw dot products grow in magnitude as embedding dimension increases. Push those large values through softmax and the distribution collapses: one token captures nearly all the weight, everything else approaches zero. Attention becomes winner-take-all. The model loses the ability to synthesize information from multiple positions simultaneously.&lt;/p&gt;

&lt;p&gt;I ran the demo without the scaling factor on the same 4-token sequence. The max attention weight went from 0.52 to 0.97. Three tokens effectively disappeared from the computation. That's not a subtle degradation — it's a broken architecture. The scaling factor isn't a hyperparameter you tune; it's load-bearing math.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why RAG is architecturally required
&lt;/h2&gt;

&lt;p&gt;Attention is computed across every pair of tokens in the sequence. For a sequence of length n, that's n² attention computations. Double the context, quadruple the compute. At 1,000 tokens the cost is manageable. At 100,000 tokens it's 10,000× more expensive than at 1,000.&lt;/p&gt;

&lt;p&gt;The curve makes two things obvious that I previously treated as preferences.&lt;/p&gt;

&lt;p&gt;First, context windows have hard limits for economic reasons, not just technical ones. You cannot solve the context problem by extending the window indefinitely. The cost curve makes that infeasible long before any memory limit does.&lt;/p&gt;

&lt;p&gt;Second, RAG is not a retrieval preference — it's the engineering solution to this constraint. Instead of putting a 50GB knowledge base into context (impossible), you embed it into a vector index, retrieve the 2–3K most relevant tokens at query time, and inject only those. You convert an O(n²) problem into an O(k²) problem where k is small and fixed. Once you see the scaling chart, RAG stops being a technique to evaluate and starts being an obvious architectural decision.&lt;/p&gt;

&lt;p&gt;The related failure mode is the lost-in-the-middle problem. Attention weights aren't uniformly distributed across position — the model reliably attends to content at the beginning and end of long contexts but loses weight on content buried in the middle. If you have critical instructions in a system prompt, don't bury them in paragraph 8 of 12.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means if you're deploying Claude
&lt;/h2&gt;

&lt;p&gt;Three things that became obvious once I understood the architecture:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token-count your inputs before diagnosing any cost problem.&lt;/strong&gt; Response length is visible; input bloat is invisible. The token counter is the first tool to reach for, not the last.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Put critical instructions at the start or end of your system prompt.&lt;/strong&gt; The lost-in-the-middle effect is a documented attention behavior, not a quirk. If your deployment has a key constraint — "always disclaim that this is not financial advice" — it belongs in the first paragraph or the last, not buried between personality instructions and formatting rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;RAG isn't optional for large knowledge bases.&lt;/strong&gt; If your deployment involves more than a few thousand tokens of reference material that changes over time, RAG is architecturally required. Not a nice-to-have. The quadratic scaling curve makes the alternative unworkable at any meaningful scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest take
&lt;/h2&gt;

&lt;p&gt;Most LLM tutorials skip the architecture entirely. You get "here's how to call the API," "here's how to write a system prompt," and "here's how to do RAG." That works until you hit a cost spike, a failure mode you can't reproduce, or a client asking why their AI assistant stops following instructions when the context gets long.&lt;/p&gt;

&lt;p&gt;The architecture isn't academic. It's the explanation for every non-obvious production behavior you'll encounter. JSON costs more because of how BPE tokenization works. RAG exists because of quadratic scaling. Prompt position matters because attention weights aren't uniform across context length. These aren't mysterious emergent properties — they follow directly from how transformers are built.&lt;/p&gt;

&lt;p&gt;Understanding the architecture doesn't make you a researcher. It makes you a better engineer.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Notebook with all the code: &lt;a href="https://github.com/saulolinares10/anthropic-alignment-notes" rel="noopener noreferrer"&gt;https://github.com/saulolinares10/anthropic-alignment-notes&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>architecture</category>
      <category>claude</category>
    </item>
    <item>
      <title>RLHF trained Claude to be verbose. Here's the proof</title>
      <dc:creator>Saulo Linares</dc:creator>
      <pubDate>Thu, 14 May 2026 03:25:56 +0000</pubDate>
      <link>https://dev.to/saulolinares10/rlhf-trained-claude-to-be-verbose-heres-the-proof-1f7p</link>
      <guid>https://dev.to/saulolinares10/rlhf-trained-claude-to-be-verbose-heres-the-proof-1f7p</guid>
      <description>&lt;h2&gt;
  
  
  The moment that made me want to understand this
&lt;/h2&gt;

&lt;p&gt;I was deep in FinMentor — my multi-agent Claude-powered financial advisor — testing a query I'd run dozens of times: "What's the difference between a mutual fund and an ETF?"&lt;/p&gt;

&lt;p&gt;The answer came back in 400 words. Four paragraphs. Bullet points. A disclaimer about individual circumstances. A closing recommendation to consult a licensed financial professional.&lt;/p&gt;

&lt;p&gt;The actual difference fits in two sentences. I had written nothing in my system prompt requesting elaboration. No "be thorough." No "explain in detail." The verbosity was coming from somewhere else.&lt;/p&gt;

&lt;p&gt;I rewrote the system prompt. "Be concise. Answer only what's asked." The response shortened — but not proportionally. The hedging stayed. The paragraph structure stayed. It felt like pushing against a strong prior rather than actually changing what the model wanted to produce. I was overriding behavior, not removing it.&lt;/p&gt;

&lt;p&gt;That distinction — override vs. remove — is what sent me to the InstructGPT paper. I wanted to understand where the prior came from. RLHF is the answer, and once I understood the mechanics, the verbosity stopped being a mystery.&lt;/p&gt;

&lt;h2&gt;
  
  
  What RLHF actually is (and what it isn't)
&lt;/h2&gt;

&lt;p&gt;My wrong mental model: RLHF is primarily a safety technique. It teaches the model what &lt;em&gt;not&lt;/em&gt; to say. A negative-space constraint — remove the dangerous outputs, leave the rest roughly intact.&lt;/p&gt;

&lt;p&gt;That frame misses the most important thing. RLHF doesn't just remove bad outputs. It actively reshapes what the model considers &lt;em&gt;good&lt;/em&gt;. And it does this by learning from human preferences — which means it inherits human biases, including the ones annotators don't know they have.&lt;/p&gt;

&lt;p&gt;RLHF works in three stages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1 — Supervised Fine-Tuning (SFT):&lt;/strong&gt; The base model is fine-tuned on human-written demonstrations. Annotators write high-quality responses to prompts. The model learns the shape of "good responses" directly. This produces a reasonably aligned model, but it's bounded by annotator quality and is expensive to scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2 — Reward Model Training:&lt;/strong&gt; Annotators compare pairs of model responses and choose which they prefer. A separate model — the reward model — is trained to predict these preferences. It learns to assign a scalar score to any (prompt, response) pair that reflects how much a human would prefer it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 3 — RL Fine-Tuning with PPO:&lt;/strong&gt; The original model is fine-tuned using reinforcement learning, with the reward model providing the training signal. Responses that score higher get reinforced. Responses that score lower get suppressed. Over thousands of updates, the model shifts toward producing outputs that maximize the reward model's score.&lt;/p&gt;

&lt;p&gt;The key word is &lt;em&gt;compression&lt;/em&gt;. The reward model takes the texture of human judgment — the full context of why someone preferred one response over another — and compresses it into a single number. Every compression loses information. That loss accumulates.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;I built a reward model simulation using the Anthropic Python SDK. The core of the experiment: generate response pairs for the same prompt, score each one on four dimensions, and measure what the scoring function actually rewards.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;generate_response_pair()&lt;/code&gt; produces two responses to the same prompt — one unconstrained, one with explicit conciseness instructions — to simulate what a human annotator would be asked to compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_response_pair&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate two responses to simulate preference data collection.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response_a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant. Answer the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s question.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="n"&gt;response_b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant. Be direct and concise.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response_a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_b&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;score_response()&lt;/code&gt; is the reward model simulation. It scores each response on helpfulness, conciseness, honesty, and safety, then computes a composite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;score_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Simulate a reward model scoring a response.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;scoring_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Score this AI response on a scale of 1–10 for each dimension.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User prompt: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Response: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dimensions: helpfulness (does it answer the question?), &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conciseness (is it appropriately brief?), &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;honesty (is it accurate and transparent?), &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;safety (does it avoid potential harms?). &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Return only valid JSON with those four keys.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;128&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a reward model. Score AI responses objectively. Return valid JSON only.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;scoring_prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;composite&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;helpfulness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;conciseness&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;honesty&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;safety&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ran this across prompts ranging from simple factual lookups to nuanced judgment calls. For each prompt I generated both a verbose and a concise response, scored both, and compared.&lt;/p&gt;

&lt;p&gt;Full notebook: &lt;a href="https://github.com/saulolinares10/anthropic-alignment-notes" rel="noopener noreferrer"&gt;https://github.com/saulolinares10/anthropic-alignment-notes&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The reward model is a lossy compression — and the loss accumulates.&lt;/strong&gt; When an annotator prefers a longer response to a short one, the reward model doesn't record their reasoning. It records the preference. If the annotator was distracted, or applying a heuristic ("more thorough = better"), or simply pattern-matching to what feels professional, all of that gets flattened into a 1. Multiply that over millions of comparisons and the bias becomes structural. The model doesn't learn "humans prefer accurate responses." It learns "humans prefer responses that &lt;em&gt;look&lt;/em&gt; like what humans rewarded." Those are different things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Verbosity bias is measurable.&lt;/strong&gt; The elaborate answer to "What is the capital of France?" — which included context about Paris's history and a note about the timezone — scored meaningfully higher on helpfulness than the single correct answer. The scoring simulation doesn't know the user wanted "Paris." It pattern-matches to elaboration. This isn't a pathological case. It's what happens at the margin across millions of training examples, and it's why the model I deployed in FinMentor adds four paragraphs to a two-sentence question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Sycophancy is the most dangerous failure mode for domain-specific apps.&lt;/strong&gt; This one landed hardest. If a FinMentor user presents a bad investment thesis — heavily concentrated, poor timing, emotionally motivated — and the model validates it because validation scores better than challenge in the training distribution, that's a real failure. Not a safety violation in the traditional sense. Not a harmful output by any standard benchmark. A sycophancy failure. The model isn't being careless. It's doing exactly what it was trained to do. That distinction matters a lot when the cost of being wrong is money.&lt;/p&gt;

&lt;h2&gt;
  
  
  My honest take
&lt;/h2&gt;

&lt;p&gt;RLHF is the best alignment technique we have at scale. I want to be clear about that — the alternative isn't a cleaner method, it's less alignment. The question isn't whether RLHF is flawed; every technique is flawed. The question is whether we're honest about the specific ways it's flawed so we can compensate for them in deployment.&lt;/p&gt;

&lt;p&gt;Verbosity and sycophancy aren't bugs someone forgot to fix. They are structural outputs of optimizing for human preference at scale when humans have consistent, measurable biases. Constitutional AI helps — CAI's explicit sycophancy reduction targets this directly, as I covered in the last post. But it doesn't close the gap for domain-specific deployment.&lt;/p&gt;

&lt;p&gt;If you're building something like FinMentor, the real fix isn't a system prompt and it isn't CAI. It's domain-specific evals that measure whether model behavior actually matches what your users need — not what the base reward model thinks humans prefer in general. A helpfulness score optimized on broad internet annotation data doesn't know that in a financial context, "concise and accurate" is almost always better than "thorough and agreeable."&lt;/p&gt;

&lt;p&gt;That gap doesn't close with a system prompt. It closes with measurement&lt;/p&gt;

&lt;p&gt;Follow along: &lt;a href="https://github.com/saulolinares10/anthropic-alignment-notes" rel="noopener noreferrer"&gt;https://github.com/saulolinares10/anthropic-alignment-notes&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I finally understood why Claude refuses things. Here's what I found</title>
      <dc:creator>Saulo Linares</dc:creator>
      <pubDate>Wed, 13 May 2026 13:59:40 +0000</pubDate>
      <link>https://dev.to/saulolinares10/i-finally-understood-why-claude-refuses-things-heres-what-i-found-11nm</link>
      <guid>https://dev.to/saulolinares10/i-finally-understood-why-claude-refuses-things-heres-what-i-found-11nm</guid>
      <description>&lt;h2&gt;
  
  
  The moment that made me want to understand this
&lt;/h2&gt;

&lt;p&gt;I've been building FinMentor — a multi-agent financial advisor that runs on Claude. Four agents: a portfolio analyst, a market researcher, a macro economist, and a critic that reviews the others before the final answer goes out. It connects to my IBKR brokerage account. I use it daily.&lt;/p&gt;

&lt;p&gt;One afternoon I ran a portfolio query — something like "how concentrated am I in tech, and should I be worried?" — and the response came back wrapped in so many caveats it was almost useless. The actual analysis was solid. But it was buried under three paragraphs of "this is not financial advice" and "it's important to consider your personal circumstances." I'd seen this before. I always blamed my system prompts.&lt;/p&gt;

&lt;p&gt;So I rewrote them. Tighter, more direct, explicit instructions to be concise. Same pattern. I tried a completely different prompt structure. Still there.&lt;/p&gt;

&lt;p&gt;That's when I stopped blaming my prompts. This wasn't coming from my instructions — it was somewhere deeper in the model. And I didn't actually know where.&lt;/p&gt;

&lt;p&gt;That question sent me to Anthropic's 2022 paper: &lt;em&gt;Constitutional AI: Harmlessness from AI Feedback&lt;/em&gt; by Bai et al.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Constitutional AI actually is (and what it isn't)
&lt;/h2&gt;

&lt;p&gt;My initial mental model was wrong in a specific way. I assumed CAI was a rulebook — a list of prohibited outputs baked into the weights during fine-tuning. A very long system prompt the model couldn't override.&lt;/p&gt;

&lt;p&gt;That's not it.&lt;/p&gt;

&lt;p&gt;CAI is a training procedure in two phases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 1 — SL-CAI (Supervised Learning):&lt;/strong&gt; You write a list of principles — the "constitution." The model generates a response to a prompt. Then you ask the &lt;em&gt;same model&lt;/em&gt; to critique that response against one of the principles. Then you ask it to rewrite the response based on the critique. The (original prompt, rewritten response) pair becomes a supervised training example. No human annotator required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Phase 2 — RLAIF (Reinforcement Learning from AI Feedback):&lt;/strong&gt; Same mechanism applied to preference labeling. Instead of asking humans "which of these two responses is better?", you ask the AI — guided by the same constitution. That preference signal trains the reward model used for RL fine-tuning.&lt;/p&gt;

&lt;p&gt;The key: RLHF at scale is bottlenecked by human annotation throughput. Each preference label requires real human attention. CAI breaks that bottleneck by using the model as its own judge. The cost of generating a preference label drops from "15 minutes of an annotator's time" to "one API call."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;I built a simulation of the SL-CAI loop using the Anthropic Python SDK. Three red-team prompts designed to elicit manipulation-adjacent responses, five constitutional principles, two revision cycles each. I logged every intermediate state — initial response, critique, revision — across all three prompts.&lt;/p&gt;

&lt;p&gt;The most revealing function is &lt;code&gt;critique_response()&lt;/code&gt;. This is the mechanical heart of CAI: the model evaluating its own output against a specific principle.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;critique_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;principle&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Evaluate this AI response against the principle: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;principle&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User request: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Response to evaluate:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Be concrete and specific: identify exact phrases that violate or could &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;better align with the principle.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a rigorous AI safety critic. Identify specific ways AI &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
               &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;responses can be improved according to stated principles.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the full loop that chains generate → critique → revise:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_cai_loop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n_cycles&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;verbose&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;initial&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_initial_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;cycles&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;initial&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n_cycles&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;principle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CONSTITUTION&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONSTITUTION&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="n"&gt;critique&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;critique_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;principle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;revised&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;revise_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;critique&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;principle&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;cycles&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cycle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;principle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;principle&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;critique&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;critique&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;revised&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;revised&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="n"&gt;current&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;revised&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;initial&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;initial&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cycles&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cycles&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The loop saves every intermediate state. That turned out to be the most interesting part of the whole experiment.&lt;/p&gt;

&lt;p&gt;Full notebook: &lt;a href="https://github.com/saulolinares10/anthropic-alignment-notes" rel="noopener noreferrer"&gt;https://github.com/saulolinares10/anthropic-alignment-notes&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What surprised me
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The first revision cycle does most of the work.&lt;/strong&gt; The delta between the initial response and the first revision was always significant. The delta between revision 1 and revision 2 was incremental — refinements, not transformations. If you're generating training data at scale, one cycle is probably sufficient. The law of diminishing returns hits fast.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The same model plays both roles — and it actually works.&lt;/strong&gt; There's no separate critic model. The same Claude instance that generated a borderline response also identifies exactly what's wrong with it and produces a better version. That shouldn't work as well as it does. It implies the model has enough internalized alignment to &lt;em&gt;critique&lt;/em&gt; a response even when its default generation didn't reflect that alignment. That asymmetry is strange and worth thinking about carefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The sycophancy angle surprised me more than the harm-avoidance angle.&lt;/strong&gt; I came in focused on harmlessness. The paper also describes using CAI to reduce sycophancy — the tendency of RLHF-trained models to prefer agreeable responses even when they're wrong, because human raters reward agreement. CAI can hard-code honesty as a constitutional principle: "don't flatter the user, don't soften inconvenient truths when accuracy matters." For someone building a financial guidance tool, that failure mode is more dangerous than most explicit harms. A model that tells you what you want to hear about your portfolio is genuinely bad.&lt;/p&gt;

&lt;h2&gt;
  
  
  My honest take
&lt;/h2&gt;

&lt;p&gt;CAI is elegant. Replacing a human annotation bottleneck with model self-critique is one of those ideas that seems obvious in retrospect — the kind of thing that makes you wonder why it took as long as it did.&lt;/p&gt;

&lt;p&gt;But the finite-constitution problem is real and shouldn't be papered over. The principles I defined cover the harms I anticipated. A novel attack vector — something the constitution's authors didn't think to include — has no catch mechanism. The model has no principle to critique against. Anthropic is explicit about this in the paper; CAI is one layer of a multi-layer defense system, not a complete solution. You still need red-teaming, evals, and human oversight at the frontier.&lt;/p&gt;

&lt;p&gt;The thing that changed for me practically: I stopped thinking about system prompts as instructions and started thinking about them as a runtime constitution. When I write a system prompt now, I think about which internalized principles I'm asking the model to partially relax, and whether I've given it enough context to do that responsibly. The caveat-heavy behavior I was seeing in FinMentor wasn't my prompt failing — it was the model applying something like a constitutional check. Understanding that changes what I write in the system prompt and what I leave out.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Up next: RLHF. I want to understand reward model training from the ground up — specifically where human preference data introduces systematic biases, and what the training dynamics look like when the reward model and the policy model update in lockstep. CAI is partly an answer to RLHF's annotation bottleneck. I want to understand the problem it's solving before I form strong opinions about whether the solution is sufficient.&lt;/p&gt;

&lt;p&gt;Follow along: &lt;a href="https://github.com/saulolinares10/anthropic-alignment-notes" rel="noopener noreferrer"&gt;https://github.com/saulolinares10/anthropic-alignment-notes&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
  </channel>
</rss>
