<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nadine </title>
    <description>The latest articles on DEV Community by Nadine  (@nadinev).</description>
    <link>https://dev.to/nadinev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3232377%2F6ced2e7e-bd7e-4baf-8a96-29a220663fc5.png</url>
      <title>DEV Community: Nadine </title>
      <link>https://dev.to/nadinev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/nadinev"/>
    <language>en</language>
    <item>
      <title>Building an AI WhatsApp Agent with OpenClaw: A Practical Field Guide</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Mon, 27 Apr 2026 02:08:26 +0000</pubDate>
      <link>https://dev.to/nadinev/building-an-ai-whatsapp-agent-with-openclaw-a-practical-field-guide-51kc</link>
      <guid>https://dev.to/nadinev/building-an-ai-whatsapp-agent-with-openclaw-a-practical-field-guide-51kc</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  About this Series
&lt;/h2&gt;

&lt;p&gt;I built an agent to monitor and respond to my WhatsApp messages, managing memory, history, and relationships with contacts, running on &lt;br&gt;
a blazing fast inference layer within a capped token budget.&lt;/p&gt;

&lt;p&gt;Most of what you'll read here I learned the hard way.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;A five-part series on building a real, production-minded AI agent: multilingual, multimodal, and connected to WhatsApp on a 1M token/day budget.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkkmbirwn8trbjvjo54w9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkkmbirwn8trbjvjo54w9.png" alt="Architecture diagram of OpenClaw showing the layered relationship between Brain, Voice, Senses, and Connection." width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Title&lt;/th&gt;
&lt;th&gt;What You'll Learn&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;01&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://dev.to/nadinev/the-brain-setting-up-openclaw-jd9"&gt;(The Brain)  Setting Up OpenClaw&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Installing OpenClaw, choosing your model, configuring the &lt;code&gt;main&lt;/code&gt; agent, workspace layout, context compaction, and establishing a markdown contract for consistent output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;02&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://dev.to/nadinev/the-voice-multilingual-layer-4mf0"&gt;(The Voice) Multilingual Layer&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Building Silas the Language Sentry, automatic language detection, multilingual response handling, and how this connects to the WhatsApp bridge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;03&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://dev.to/nadinev/the-senses-image-generation-media-266k"&gt;(The Senses) Image Generation &amp;amp; Media&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Working with &lt;code&gt;tools.deny&lt;/code&gt; and &lt;code&gt;tools.media&lt;/code&gt; scopes, owner-only image generation, deny-first permission design, and managing latency UX for media responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;04&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://dev.to/nadinev/the-connection-whatsapp-bridge-1962"&gt;(The Connection) WhatsApp Bridge&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Setting up the gateway (token + loopback), Docker deployment pattern, WhatsApp channel config, session management, and group handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;05&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;a&gt;Future Outlook &amp;amp; Operating Model&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;End-to-end system flow, ops checklist, Lingo and Tailscale on the roadmap, and a full recommended reading order for the series&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Companion (deep dive, not a numbered part):&lt;/strong&gt; &lt;code&gt;OpenClaw Skill Shield: Multilingual Edition&lt;/code&gt; — Skill Shield, identity leakage, multilingual gap, and config tables.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
    </item>
    <item>
      <title>Future Outlook &amp; Operating Model</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Mon, 27 Apr 2026 02:07:38 +0000</pubDate>
      <link>https://dev.to/nadinev/future-outlook-operating-model-8jp</link>
      <guid>https://dev.to/nadinev/future-outlook-operating-model-8jp</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;The Catalyst: A System, Not a Demo&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;OpenClaw stops being a toy the day you &lt;strong&gt;run it for a week&lt;/strong&gt;: models change, skills update, logs grow, and someone new will try the one message you did not test. The &lt;em&gt;Practical Guide&lt;/em&gt; is not a single prompt, it is a &lt;strong&gt;repeatable stack&lt;/strong&gt;: Brain, Voice, Senses, Connection, plus the boring discipline of operations.&lt;/p&gt;

&lt;p&gt;This final article in the series ties the four phases together, lists an &lt;strong&gt;operating checklist&lt;/strong&gt; you can run monthly, and names future directions (Lingo-style translation, remote gateway access) without pretending they are free.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Overview: The End-to-End Picture&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Flow (conceptual):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;WhatsApp (or other channel)&lt;/strong&gt; delivers a message to the &lt;strong&gt;Gateway&lt;/strong&gt; (auth, routing).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session&lt;/strong&gt; scoping and idle/maintenance apply (&lt;code&gt;dmScope&lt;/code&gt;, reset, prune).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silas&lt;/strong&gt; (Voice) can pre-screen; &lt;strong&gt;Senses&lt;/strong&gt; (media / image) obey allow-scopes and deny-lists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model&lt;/strong&gt; in OpenClaw produces a reply; &lt;strong&gt;Logging&lt;/strong&gt; can redact sensitive tool content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workspace&lt;/strong&gt; and optional memory files back long-lived intent — under &lt;strong&gt;Brain&lt;/strong&gt; policy.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A simple mental diagram:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User → Channel → Gateway (auth) → Session(key) → Skills + Tools → Model → Reply
                                    ↑
                         Workspace (identity, user, SOUL) + openclaw.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Connection recap:&lt;/em&gt; I run that &lt;strong&gt;gateway&lt;/strong&gt; as a &lt;strong&gt;normal process&lt;/strong&gt; on the host, not in a container; part 4 is the source of truth for how the WhatsApp bridge and allowlists fit together.&lt;/p&gt;




&lt;h3&gt;
  
  
  In this section:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;1. Operating Model: Weekly Habits&lt;/li&gt;
&lt;li&gt;2. Safety Checklist (First Deploy + Ongoing)&lt;/li&gt;
&lt;li&gt;3. Future Outlook: Translation and Lingodotdev&lt;/li&gt;
&lt;li&gt;4. Remote Access: Tailscale vs Expose Port&lt;/li&gt;
&lt;li&gt;5. Ecosystem and Ethos&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Operating Model: Weekly Habits&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Habit&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Check &lt;code&gt;openclaw.json&lt;/code&gt; in git&lt;/strong&gt; (if you version it) or diff against backup&lt;/td&gt;
&lt;td&gt;Catches “one-line” regressions (deny list, allowFrom, new tools).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Rotate&lt;/strong&gt; &lt;code&gt;${OPENCLAW_GATEWAY_TOKEN}&lt;/code&gt; on any hint of leak; restart gateway.&lt;/td&gt;
&lt;td&gt;Prevents silent MITM in your own LAN / tunnel misconfig.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Re-read &lt;code&gt;SOUL.md&lt;/code&gt; and &lt;code&gt;SKILL.md&lt;/code&gt; together&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Policy drift is the silent killer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Prune&lt;/strong&gt; old sessions/media if you use &lt;code&gt;maintenance&lt;/code&gt; / disk tools&lt;/td&gt;
&lt;td&gt;Stops unbounded &lt;code&gt;workspace/media&lt;/code&gt; and session stores.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Review&lt;/strong&gt; &lt;code&gt;logging.redactSensitive&lt;/code&gt; and &lt;code&gt;redactPatterns&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Add patterns for new PII you introduced (cities, domains, not only phone regex).&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Safety Checklist (First Deploy + Ongoing)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Brain&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] One primary model; provider &lt;code&gt;baseUrl&lt;/code&gt; and env keys are correct&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;workspace&lt;/code&gt; path points at the folder you back up&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;compaction&lt;/code&gt; enabled if you have long threads&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;AGENTS.md&lt;/code&gt; / &lt;code&gt;user.md&lt;/code&gt; / &lt;code&gt;identity.md&lt;/code&gt; / &lt;code&gt;SOUL.md&lt;/code&gt; are &lt;strong&gt;short, aligned, and non-diary&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Voice&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;code&gt;silas-shield&lt;/code&gt; (or your equivalent) is enabled on the right agent&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;hash.py&lt;/code&gt; has &lt;code&gt;${SILAS_SALT}&lt;/code&gt; in the process environment, not in prompts&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;shield.py&lt;/code&gt; checks are wired the way &lt;em&gt;your&lt;/em&gt; OpenClaw build expects (hooks, commands)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Senses&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;code&gt;openai-image-gen&lt;/code&gt; denied until you &lt;em&gt;want&lt;/em&gt; it&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;tools.media&lt;/code&gt; default deny + allow rules for &lt;strong&gt;only&lt;/strong&gt; the threads you trust&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;mediaMaxMb&lt;/code&gt; matches your real usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Connection&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] &lt;code&gt;channels.whatsapp.enabled&lt;/code&gt; + &lt;code&gt;allowFrom&lt;/code&gt; + &lt;code&gt;dmPolicy&lt;/code&gt; + &lt;code&gt;groupPolicy&lt;/code&gt; match your life&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;gateway&lt;/code&gt; bind mode matches threat model (loopback by default; widen only on purpose)&lt;/li&gt;
&lt;li&gt;[ ] &lt;code&gt;debounceMs&lt;/code&gt; high enough to stop duplicate work, low enough to feel live&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This series does not list your phone numbers, tokens, or keys&lt;/strong&gt;. The checklist is about &lt;em&gt;the shape of a healthy install&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Future Outlook: Translation and Lingodotdev&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Shield implementation keeps translation as planned in the Python stack; JS shims exist for a future Lingodotdev path. A sane roadmap:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First:&lt;/strong&gt; get &lt;strong&gt;local&lt;/strong&gt; &lt;code&gt;shield.py&lt;/code&gt; + &lt;code&gt;pre_screener.py&lt;/code&gt; + &lt;code&gt;script_detector.py&lt;/code&gt; correct — zero marginal API cost, deterministic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Then:&lt;/strong&gt; add optional Lingo (or any translation service) &lt;em&gt;only&lt;/em&gt; for messages that pass the cheap gates and you explicitly budget for&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never:&lt;/strong&gt; send entire conversation history to translation; translate &lt;strong&gt;candidate spans&lt;/strong&gt; with redaction&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Cultural nuance (again):&lt;/strong&gt; translation is a &lt;strong&gt;user-experience&lt;/strong&gt; tool, not a &lt;strong&gt;security&lt;/strong&gt; primitive.The policy still comes from the skill + &lt;code&gt;SOUL.md&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Remote Access: Tailscale vs Expose Port&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;gateway.tailscale&lt;/code&gt; exists in the schema as a switch; mine is &lt;code&gt;off&lt;/code&gt; today. The trade is familiar:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Off / loopback&lt;/strong&gt;: best default for a home install&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tailscale (or same VPN)&lt;/strong&gt;: reach the gateway from your phone &lt;em&gt;without&lt;/em&gt; public port 18789&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Raw public port&lt;/strong&gt;: only with additional auth, rate limits, and the expectation of scrapers&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Practical Guide&lt;/em&gt; rule: &lt;strong&gt;never ship “security by obscurity on port 18789.”&lt;/strong&gt; If it is on the internet, it must assume it is &lt;em&gt;scanned&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5. Ecosystem and Ethos&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;OpenClaw and projects like a personal “Clawdbot” show the same idea: the operator owns the stack, the model is a component, and &lt;strong&gt;policy is code + markdown you can read&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;Practical Guide&lt;/em&gt; series is my contribution for &lt;strong&gt;first-time&lt;/strong&gt; builders: you do not need a novel architecture on day one. You need a &lt;strong&gt;boring, testable, layered&lt;/strong&gt; one: Brain, Voice, Senses, Connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion:&lt;/strong&gt; ship small, log carefully, &lt;strong&gt;deny by default&lt;/strong&gt;, and treat every new channel as a new &lt;strong&gt;firewall&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series (reading order)&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://dev.to/nadinev/the-brain-setting-up-openclaw-jd9"&gt;&lt;em&gt;(The Brain) Setting Up OpenClaw.txt&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/nadinev/the-voice-multilingual-layer-4mf0"&gt;&lt;em&gt;(The-Voice) MultilingualLayer&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/nadinev/the-senses-image-generation-media-266k"&gt;&lt;em&gt;(The Senses) Image Generation and Media&lt;/em&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/nadinev/the-connection-whatsapp-bridge-1962"&gt;(The Connection) WhatsApp Bridge&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href="https://dev.to/nadinev/future-outlook-operating-model-8jp"&gt;This Article: Future Outlook and Operating Model&lt;/a&gt;&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;strong&gt;Further Reading&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://1688.pixel-geist.co.za/1" rel="noopener noreferrer"&gt;OpenClaw Skill Shield: Multilingual Edition&lt;/a&gt; - a standalone deep dive into Silas, PII handling, multilingual gaps, and config tables.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>devops</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>(The Connection) WhatsApp Bridge</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Mon, 27 Apr 2026 02:05:47 +0000</pubDate>
      <link>https://dev.to/nadinev/the-connection-whatsapp-bridge-1962</link>
      <guid>https://dev.to/nadinev/the-connection-whatsapp-bridge-1962</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;The Catalyst: The Interface Is the Attack Surface&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;WhatsApp is the ultimate messaging interface: it is on every phone, it is end-to-end encrypted. The brain can be perfect; the &lt;strong&gt;connection&lt;/strong&gt; is where pairing, allowlists, and gateway auth decide who gets to talk to the bot at all.&lt;/p&gt;

&lt;p&gt;Phase 4 of the Practical Guide series is &lt;strong&gt;The Connection&lt;/strong&gt;: gateway, plugin, channel policy, DM scoping, and groups. I run the gateway on the host (no Docker in my day-to-day path).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Overview&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Parts 1 to 3 already gave you a &lt;strong&gt;model&lt;/strong&gt;, &lt;strong&gt;Silas / policy&lt;/strong&gt;, and &lt;strong&gt;media scopes&lt;/strong&gt;. This article is where those meet the &lt;strong&gt;real wire&lt;/strong&gt;: who may send messages, how the OpenClaw &lt;strong&gt;gateway&lt;/strong&gt; sees them, and how &lt;strong&gt;session keys&lt;/strong&gt; line up with &lt;code&gt;tools.media&lt;/code&gt; &lt;code&gt;keyPrefix&lt;/code&gt; rules from the Senses article. Expect &lt;strong&gt;more channel wiring&lt;/strong&gt; here than in part 1, not a second lecture on the same &lt;code&gt;openclaw.json&lt;/code&gt; fields from scratch.&lt;/p&gt;

&lt;p&gt;My configuration enables the WhatsApp plugin, constrains the gateway, and makes session isolation explicit at the &lt;strong&gt;session&lt;/strong&gt; layer in addition to skills:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;My settings (concept)&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;plugins.entries.whatsapp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;enabled: true&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Turns on the channel integration.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;channels.whatsapp&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;enabled&lt;/code&gt;, &lt;code&gt;dmPolicy&lt;/code&gt;, &lt;code&gt;selfChatMode&lt;/code&gt;, &lt;code&gt;allowFrom&lt;/code&gt;, &lt;code&gt;groupPolicy&lt;/code&gt;, &lt;code&gt;debounceMs&lt;/code&gt;, &lt;code&gt;mediaMaxMb&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Who may DM, how groups are gated, and transport limits.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;session.dmScope&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;per-channel-peer&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;DMs are not one global blob; pair identity with the channel+peer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;session.reset&lt;/code&gt; / &lt;code&gt;session.maintenance&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;idle reset + &lt;code&gt;pruneAfter&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Stops sessions from living forever in RAM/disk.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gateway&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;port&lt;/code&gt;, &lt;code&gt;mode&lt;/code&gt;, &lt;code&gt;bind&lt;/code&gt;, &lt;code&gt;auth&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Where the local gateway listens and how clients authenticate.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Host process&lt;/td&gt;
&lt;td&gt;e.g. &lt;code&gt;gateway.cmd&lt;/code&gt; + Node on Windows&lt;/td&gt;
&lt;td&gt;I start OpenClaw’s gateway as a &lt;strong&gt;normal process&lt;/strong&gt; on the machine that owns &lt;code&gt;.openclaw/&lt;/code&gt;. No container in my run.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;No live secrets in documentation.&lt;/strong&gt; Use &lt;code&gt;${OPENCLAW_GATEWAY_TOKEN}&lt;/code&gt; in examples; generate a long random token and never paste it into chat or public repos.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  In this section:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;1. Gateway: Port, Loopback, Token Auth&lt;/li&gt;
&lt;li&gt;2. How I run the gateway (no Docker)&lt;/li&gt;
&lt;li&gt;3. Channel: DMs, Pairing, and Allowlists&lt;/li&gt;
&lt;li&gt;4. Group Mentions: When the Bot Wakes Up&lt;/li&gt;
&lt;li&gt;5. Webhooks, Real Time, and Where the Phone Meets the Gateway&lt;/li&gt;
&lt;li&gt;6. session.dm Scope and Heartbeat of Trust&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Gateway: Port, Loopback, Token Auth&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;A typical &lt;code&gt;gateway&lt;/code&gt; block (shape only):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"gateway"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"port"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18789&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"bind"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"loopback"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"auth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"token"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"token"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${OPENCLAW_GATEWAY_TOKEN}"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tailscale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"off"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"resetOnExit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;bind: "loopback"&lt;/code&gt;&lt;/strong&gt;: the gateway is not a wide-open LAN service by default. If you need remote access, that is a deliberate &lt;em&gt;Connection&lt;/em&gt; project (see Article-05: Tailscale vs public IP).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;auth.mode: "token"&lt;/code&gt;&lt;/strong&gt;: every client that hits the gateway must know the token. &lt;strong&gt;Rotate&lt;/strong&gt; if a token ever leaks; treat it like a password.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;gateway.nodes.denyCommands&lt;/code&gt;&lt;/strong&gt; (if present): I deny a set of high-impact device/calendar style commands at the node layer. Adjust to match what you are willing to expose from a phone bridge.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;On Windows,&lt;/strong&gt; you may have a &lt;code&gt;gateway.cmd&lt;/code&gt; that starts Node with the OpenClaw package. Do not commit this file to public repos; it often contains a resolved token. Prefer env-based injection for docs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A &lt;strong&gt;plain text reply&lt;/strong&gt; from the “brain” can still take &lt;strong&gt;tens of seconds&lt;/strong&gt; on a slow model or a long context. On WhatsApp that feels like a hang. I already use &lt;strong&gt;&lt;code&gt;debounceMs&lt;/code&gt;&lt;/strong&gt; so one tap does not double-fire; if your &lt;strong&gt;bridge&lt;/strong&gt; exposes &lt;strong&gt;typing&lt;/strong&gt; or a &lt;strong&gt;read&lt;/strong&gt; signal, a short “thinking” state helps more than a faster logo. The fix is &lt;strong&gt;UX&lt;/strong&gt;, not more tokens in the system prompt.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. How I run the gateway (no Docker)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I &lt;strong&gt;do not&lt;/strong&gt; run the OpenClaw gateway in Docker. The gateway is a &lt;strong&gt;local&lt;/strong&gt; process, same machine as my &lt;code&gt;.openclaw&lt;/code&gt; tree, with env vars (including &lt;code&gt;${OPENCLAW_GATEWAY_TOKEN}&lt;/code&gt;) set the way a normal app expects. Some OpenClaw trees ship a &lt;strong&gt;sample&lt;/strong&gt; &lt;code&gt;docker-compose.yml&lt;/code&gt; for people who want a container; that is &lt;strong&gt;optional&lt;/strong&gt; for other people’s deployments, not my path. If you adopt containers later, the &lt;strong&gt;shape&lt;/strong&gt; of &lt;code&gt;openclaw.json&lt;/code&gt; does not change: only &lt;strong&gt;how&lt;/strong&gt; you start the process does.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Channel: DMs, Pairing, and Allowlists&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Key fields in &lt;code&gt;channels.whatsapp&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;dmPolicy: pairing&lt;/code&gt;&lt;/strong&gt;: unknown numbers should not get full access until a pairing/approval path completes (your OpenClaw version defines the exact UX).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;allowFrom&lt;/code&gt;&lt;/strong&gt;: E.164 allowlist. &lt;strong&gt;Yours&lt;/strong&gt; goes here; in shared docs, describe the &lt;em&gt;pattern&lt;/em&gt; (“owner + a trusted test number only”).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;selfChatMode: true&lt;/code&gt;&lt;/strong&gt;: useful when you are your own “first user” in the same app session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;groupPolicy: "allowlist"&lt;/code&gt;&lt;/strong&gt;: groups are not open season; only listed groups (per product docs) should get bot participation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;debounceMs&lt;/code&gt;&lt;/strong&gt;:  I use ~1500 ms to absorb double-tap sends and flappy connectivity before the agent does expensive work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;mediaMaxMb&lt;/code&gt;&lt;/strong&gt;: cap attachments so the connection cannot be used as a free CDN stress test.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mental model:&lt;/strong&gt; &lt;em&gt;pairing&lt;/em&gt; + &lt;em&gt;allowlist&lt;/em&gt; = &lt;strong&gt;identity-based firewall&lt;/strong&gt; in front of the LLM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;allowFrom&lt;/code&gt; and the “owner” in parts 2 and 3:&lt;/strong&gt; &lt;code&gt;channels.whatsapp.allowFrom&lt;/code&gt; is the &lt;strong&gt;E.164 allowlist&lt;/strong&gt; of who may talk to the bot on this channel. The &lt;strong&gt;“operator”&lt;/strong&gt; or &lt;strong&gt;“owner”&lt;/strong&gt; phrasing in &lt;code&gt;SKILL.md&lt;/code&gt; and &lt;strong&gt;owner-only&lt;/strong&gt; media in part 3 should match &lt;strong&gt;your&lt;/strong&gt; trusted thread: the same &lt;strong&gt;session identity&lt;/strong&gt; the bridge uses for &lt;strong&gt;your&lt;/strong&gt; DM, which is also the one you target with a &lt;code&gt;tools.media&lt;/code&gt; &lt;strong&gt;keyPrefix&lt;/strong&gt; like &lt;code&gt;whatsapp:direct:+1XXXXXXXXXX&lt;/code&gt; in examples (use &lt;strong&gt;your&lt;/strong&gt; real prefix in your private config, not a copy from a blog). Strangers in group chats that are not in your model do not get a casual path to that session.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Twilio and Business WhatsApp (my stack)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I use &lt;strong&gt;Twilio&lt;/strong&gt; with a &lt;strong&gt;WhatsApp Business&lt;/strong&gt; number provisioned the way &lt;strong&gt;Twilio’s&lt;/strong&gt; WhatsApp product documents (this is &lt;strong&gt;not&lt;/strong&gt; the same step list as a raw &lt;strong&gt;Meta Cloud API&lt;/strong&gt; hand-build, and it is not a &lt;strong&gt;headless browser&lt;/strong&gt; / Puppeteer bridge). Pick &lt;strong&gt;one&lt;/strong&gt; guide and follow it end to end or you will mix credentials.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For beginners:&lt;/strong&gt; people on the &lt;strong&gt;allowlist&lt;/strong&gt; are still talking to &lt;strong&gt;your&lt;/strong&gt; WhatsApp identity. They will read the bot as &lt;strong&gt;you&lt;/strong&gt;, not a new contact card, so treat the allowlist as “who is allowed to make my number say agent output.”&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Groups:&lt;/strong&gt; the bot only speaks when the patterns match a &lt;strong&gt;mention&lt;/strong&gt; of the assistant (see &lt;strong&gt;Group Mentions&lt;/strong&gt; below and your &lt;code&gt;groupPolicy&lt;/code&gt;). It does not narrate the whole group by default.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Group Mentions: When the Bot Wakes Up&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;messages.groupChat.mentionPatterns&lt;/code&gt; list includes variants of the assistant’s call name and casual “hey” forms so the agent does not spam a whole group on every off-topic line.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;ackReactionScope: "group-mentions"&lt;/code&gt; (if supported in your build) keeps acknowledgement behaviour scoped.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Match patterns to the &lt;strong&gt;name&lt;/strong&gt; you set under &lt;code&gt;ui.assistant&lt;/code&gt; (I use &lt;code&gt;Clawd&lt;/code&gt; in the UI; patterns reference &lt;code&gt;@clawd&lt;/code&gt; and similar). Keep patterns short enough to be memorable, not so broad that every line triggers the model.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5. Webhooks, Real Time, and Where the Phone Meets the Gateway&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Exact webhook URLs differ by host release. The &lt;strong&gt;contract&lt;/strong&gt; in any setup is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Provider or bridge&lt;/strong&gt; (WhatsApp / Meta / Baileys / etc.) posts events to your gateway.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway&lt;/strong&gt; authenticates, normalises, and hands messages to the agent with a stable &lt;strong&gt;session key&lt;/strong&gt; (this is the same family of keys you used in &lt;code&gt;tools.media&lt;/code&gt; &lt;code&gt;keyPrefix&lt;/code&gt; rules in Article-03).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session&lt;/strong&gt; + &lt;strong&gt;skills&lt;/strong&gt; apply — Shield, log redaction, allow/deny tools.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Troubleshooting without panic:&lt;/strong&gt; if messages stop flowing, check (port → token → channel enabled → allowFrom → plugin enabled), in that order. Nine times in ten it is a &lt;strong&gt;token&lt;/strong&gt; or a &lt;strong&gt;restarted&lt;/strong&gt; gateway without the same env as before.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;6. session.dm Scope and Heartbeat of Trust&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;per-channel-peer&lt;/code&gt; means: “&lt;strong&gt;this&lt;/strong&gt; DM thread is not &lt;strong&gt;that&lt;/strong&gt; DM thread”. This is important when you later have more than one human talking to the same bot account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pair with&lt;/strong&gt; maintenance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;session.reset.idleMinutes&lt;/code&gt;: so long-idle DMs can reset context predictably&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;session.maintenance&lt;/code&gt;: prune after a defined window (e.g. 7d) if you do not need infinite retention&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The Connection article&lt;/strong&gt; is not the &lt;strong&gt;Voice&lt;/strong&gt; (Shield) article. Both are needed: wiring decides who &lt;em&gt;may&lt;/em&gt; connect; policy decides what they &lt;em&gt;may&lt;/em&gt; do once connected.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Conclusion (Phase 4):&lt;/strong&gt; treat WhatsApp as a &lt;strong&gt;public API&lt;/strong&gt; to your home lab. &lt;strong&gt;Pair, allowlist, token-auth the gateway, debounce, and cap media&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series navigation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Previous: &lt;em&gt;&lt;a href="https://dev.to/nadinev/the-senses-image-generation-media-266k"&gt;The Senses&lt;/a&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Next: &lt;em&gt;&lt;a href="https://dev.to/nadinev/future-outlook-operating-model-8jp"&gt;Future Outlook &amp;amp; Operating Model&lt;/a&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>architecture</category>
      <category>security</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>(The Senses) Image Generation &amp; Media</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Mon, 27 Apr 2026 02:03:51 +0000</pubDate>
      <link>https://dev.to/nadinev/the-senses-image-generation-media-266k</link>
      <guid>https://dev.to/nadinev/the-senses-image-generation-media-266k</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;The Catalyst&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Field note: Nano Banana Pro and reactive image gen&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I hit a real workflow failure mode: a &lt;strong&gt;proactive&lt;/strong&gt; image stack (Nano Banana Pro) that would &lt;strong&gt;spontaneously&lt;/strong&gt; generate images while the agent was effectively listening in on a chat. Worse, it would &lt;strong&gt;regenerate images people had shared&lt;/strong&gt; in the conversation. That was the biggest day-to-day nuisance, and a big part of why I went with OpenAI’s image API &lt;strong&gt;(DALL·E)&lt;/strong&gt; path: &lt;strong&gt;only generate when explicitly asked&lt;/strong&gt;, not because the conversation &lt;em&gt;suggested&lt;/em&gt; something visual.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Eyes that see too much&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The moment you add &lt;strong&gt;images, audio, and video&lt;/strong&gt;, the model can see your camera roll path, a cached thumbnail, or a viral meme. A malicious payload can be &lt;strong&gt;in&lt;/strong&gt; the image, not the caption. I wanted senses (multimedia) without giving the model &lt;strong&gt;surveillance&lt;/strong&gt; over my disk or a blank cheque to generate images for strangers.&lt;/p&gt;

&lt;p&gt;Phase 3 of the series is &lt;strong&gt;The Senses&lt;/strong&gt;: how OpenClaw exposes images, how you &lt;strong&gt;deny&lt;/strong&gt; image-generation tools by default, how &lt;strong&gt;allow-scopes&lt;/strong&gt; work per channel, and how to keep users engaged while a heavy operation runs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Covered in other articles:&lt;/strong&gt; identity leakage via workspace files and cached media (see &lt;em&gt;&lt;a href="https://1688.pixel-geist.co.za/1" rel="noopener noreferrer"&gt;OpenClaw Skill Shield&lt;/a&gt;&lt;/em&gt; and &lt;em&gt;&lt;a href="https://dev.to/nadinev/the-brain-setting-up-openclaw-1p4i-temp-slug-7665266?preview=43b5e5eb7fd5a835a0c1c38d54196ee9c20d257afa4ccfbcdacc18cd941c0d21a54574b7dd46e55daddc9664730c91be2d64dae5791a9831358805ee"&gt;Setting up OpenClaw&lt;/a&gt;&lt;/em&gt;). Here the focus is &lt;strong&gt;tooling and config&lt;/strong&gt; for multimedia.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Overview&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In my &lt;code&gt;openclaw.json&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;tools.deny&lt;/code&gt;&lt;/strong&gt; includes &lt;code&gt;openai-image-gen&lt;/code&gt; at the top level where the model is not given a casual path to DALL·E tools even if the skill package exists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;tools.media&lt;/code&gt;&lt;/strong&gt; enables image, audio, and video, each with a &lt;strong&gt;default deny&lt;/strong&gt; and &lt;strong&gt;explicit allow rules&lt;/strong&gt; that match a channel and a &lt;code&gt;keyPrefix&lt;/code&gt; (e.g. your owner WhatsApp direct thread key, expressed as a &lt;strong&gt;placeholder&lt;/strong&gt; in docs).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;skills.entries.openai-image-gen&lt;/code&gt;&lt;/strong&gt; can still hold &lt;code&gt;${OPENAI_API_KEY}&lt;/code&gt; for when you &lt;em&gt;deliberately&lt;/em&gt; re-enable the skill in a controlled way.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;Silas&lt;/strong&gt; skill (&lt;code&gt;SKILL.md&lt;/code&gt;) adds behavioural law: do not call image-gen tools for non-operator sessions, treat blocked vision input as blocked, and never guess the pixels.&lt;/p&gt;




&lt;h3&gt;
  
  
  In this section:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;1. Image Generation: Deny First, Enable Deliberately&lt;/li&gt;
&lt;li&gt;2. Inbound Media: Scopes, Not “On for the World”&lt;/li&gt;
&lt;li&gt;3. Filesystem: Workspace-Only&lt;/li&gt;
&lt;li&gt;4. Latency: Keeping Humans Calm While “Senses” Work&lt;/li&gt;
&lt;li&gt;5. Checklist: Senses in Production&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Image Generation: Deny First, Enable Deliberately&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mechanism&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tools.deny: ["openai-image-gen"]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;A global deny list removes the tool from the agent’s easy reach.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skill config &lt;code&gt;openai-image-gen.apiKey&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;When you &lt;em&gt;do&lt;/em&gt; enable, keys live in env, not in chat logs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;SKILL.md&lt;/code&gt; image-gen section&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Behavioural&lt;/strong&gt; backstop: even if a tool slipped through, the model is instructed to refuse for non-operator contacts.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;New-user default:&lt;/strong&gt; start with &lt;code&gt;openai-image-gen&lt;/code&gt; &lt;strong&gt;denied&lt;/strong&gt; until you have (a) a billing/usage cap you accept, and (b) a clear “who may request images” policy (owner session vs everyone). The &lt;strong&gt;Connection&lt;/strong&gt; article (part 4) names how my &lt;strong&gt;WhatsApp bridge&lt;/strong&gt; maps &lt;code&gt;allowFrom&lt;/code&gt; and session keys to &lt;strong&gt;who counts as the operator&lt;/strong&gt; so “owner-only” in config and in &lt;code&gt;SKILL.md&lt;/code&gt; are the same person in practice.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Inbound Media: Scopes, Not “On for the World”&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;tools.media&lt;/code&gt; for &lt;code&gt;image&lt;/code&gt; / &lt;code&gt;audio&lt;/code&gt; / &lt;code&gt;video&lt;/code&gt; shares the same pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;"default": "deny"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"rules"&lt;/code&gt;: one or more &lt;code&gt;{ "action": "allow", "match": { "channel": "whatsapp", "keyPrefix": "..." } }&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What &lt;code&gt;keyPrefix&lt;/code&gt; means in practice:&lt;/strong&gt; it is a channel-specific routing key. Your OpenClaw build should document the exact string format; treat it as a &lt;strong&gt;capability&lt;/strong&gt;, only the threads you list get inbound multimodal access at the tool layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example (use your own key prefix, not a copy-paste of someone’s phone number):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"media"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"image"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"scope"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"default"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"rules"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"match"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"channel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"whatsapp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"keyPrefix"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"whatsapp:direct:+1XXXXXXXXXX"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repeat the same idea for &lt;code&gt;audio&lt;/code&gt; and &lt;code&gt;video&lt;/code&gt; if you want symmetric behaviour. If a modality should stay &lt;strong&gt;off entirely&lt;/strong&gt;, set &lt;code&gt;enabled: false&lt;/code&gt; for that block instead of relying on empty rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;channels.whatsapp.mediaMaxMb&lt;/code&gt;:&lt;/strong&gt; set an upper bound (my config uses 50 MB) so a single “document as video” cannot exhaust disk or the gateway.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Filesystem: Workspace-Only&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;tools.fs.workspaceOnly: true&lt;/code&gt; means the model’s file tools are anchored to the configured workspace, not an arbitrary path. That pairs with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inbound media cache living under your OpenClaw media areas (separate from random OS paths, depending on your build)&lt;/li&gt;
&lt;li&gt;Outbound or generated files you intentionally place under &lt;code&gt;workspace/media/...&lt;/code&gt; when you &lt;em&gt;want&lt;/em&gt; the agent to reference them&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Practical guide rule:&lt;/strong&gt; if the LLM can &lt;em&gt;read&lt;/em&gt; a file, assume it can be &lt;strong&gt;summarised or exfiltrated&lt;/strong&gt; unless session + skills forbid it. Deny is the default; allow is a contract.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Latency: Keeping Humans Calm While “Senses” Work&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; A minute of silence feels like a dropped message, especially on WhatsApp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Patterns that work:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ACK early&lt;/strong&gt; where your channel allows it (reactions, short “Received, processing” copy).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk work:&lt;/strong&gt; transcribe or describe in stages, not one giant block at the end.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set expectations in &lt;code&gt;SOUL.md&lt;/code&gt; / identity:&lt;/strong&gt; the assistant can say it may take a few seconds for audio or large images.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Debounce (channel):&lt;/strong&gt; a longer &lt;code&gt;debounceMs&lt;/code&gt; on the WhatsApp channel reduces double-firing on slow networks. You trade a little latency for fewer duplicate heavy jobs. See the Connection article for &lt;code&gt;debounceMs&lt;/code&gt; as wiring, not as speed hack.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Reality check:&lt;/strong&gt; &lt;em&gt;fast model&lt;/em&gt; + &lt;em&gt;large media&lt;/em&gt; still hits API limits. The UX fix is &lt;strong&gt;communication&lt;/strong&gt;, not overpromising in the system prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cultural matters (ties to the Voice article):&lt;/strong&gt; when replying in a second language, a short &lt;em&gt;localised&lt;/em&gt; “working on it” line often lands better than English. &lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5. Checklist: Senses in Production&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;You want&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image gen&lt;/td&gt;
&lt;td&gt;Deny tool globally until policy is explicit.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inbound image/audio/video&lt;/td&gt;
&lt;td&gt;Default deny; allow only named channel + key prefix.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model behaviour&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SKILL.md&lt;/code&gt; matches config (no “secret” image gen path).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disk and limits&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;mediaMaxMb&lt;/code&gt; sane; monitor &lt;code&gt;workspace/media&lt;/code&gt; growth.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User trust&lt;/td&gt;
&lt;td&gt;Early ACK + honest latency messaging.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion (Phase 3)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Senses are optional superpowers. &lt;strong&gt;Default closed, open on purpose, behaviourally enforced&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series navigation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Previous: &lt;em&gt;&lt;a href="https://dev.to/nadinev/the-voice-multilingual-layer-4mf0"&gt;The Voice&lt;/a&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Next: &lt;em&gt;&lt;a href="https://dev.to/nadinev/the-connection-whatsapp-bridge-1962"&gt;The Connection (WhatsApp Bridge)&lt;/a&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>privacy</category>
      <category>security</category>
    </item>
    <item>
      <title>(The Voice) Multilingual Layer</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Mon, 27 Apr 2026 02:02:37 +0000</pubDate>
      <link>https://dev.to/nadinev/the-voice-multilingual-layer-4mf0</link>
      <guid>https://dev.to/nadinev/the-voice-multilingual-layer-4mf0</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;The Catalyst: One Language, Many Attack Surfaces&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The comfortable fiction is: “We wrote English rules, so the model is safe.” The truth: LLMs are multilingual. A user can request the same jailbreak in another script, mix Latin keywords into CJK text, or hide instructions behind homoglyphs. If your policy lives only in English sentences, you have not policed the channel.&lt;/p&gt;

&lt;p&gt;Phase 2 of the &lt;em&gt;Practical Guide&lt;/em&gt; series is the &lt;strong&gt;Voice&lt;/strong&gt; layer: how to handle multiple languages and cultural nuance without giving attackers a free pass. The implementation detail is &lt;strong&gt;Silas Shield&lt;/strong&gt; (&lt;code&gt;silas-shield&lt;/code&gt;); the narrative is &lt;strong&gt;Language Sentry&lt;/strong&gt;. The same rules apply to every language.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Overview&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Skill Shield (Silas)&lt;/strong&gt; in my setup is a drop-in OpenClaw skill: &lt;code&gt;SKILL.md&lt;/code&gt; enforces vision rules, PII hashing, image-gen lockdown, cross-session isolation, and multilingual injection defence. The Python entry points (&lt;code&gt;shield.py&lt;/code&gt;, &lt;code&gt;script_detector.py&lt;/code&gt;, &lt;code&gt;pre_screener.py&lt;/code&gt;, &lt;code&gt;hash.py&lt;/code&gt;) run &lt;em&gt;locally&lt;/em&gt; for message checks. This is cheap and predictable compared with burning another LLM call per message.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token budget (what actually burns money):&lt;/strong&gt; &lt;code&gt;shield.py&lt;/code&gt; runs on the &lt;strong&gt;host&lt;/strong&gt; before you spend model tokens on a bad message. The main &lt;strong&gt;context window&lt;/strong&gt; and &lt;strong&gt;compaction&lt;/strong&gt; you set in part 1 still decide how much &lt;strong&gt;history&lt;/strong&gt; the LLM sees. Silas is not, in my setup, a second hidden prompt that stacks on top of every reply and eats the 1M/day line by itself.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This article does not replace&lt;/strong&gt; &lt;a href="./https://1688.pixel-geist.co.za/1"&gt;&lt;em&gt;OpenClaw Skill Shield: Multilingual Edition&lt;/em&gt;&lt;/a&gt; &lt;strong&gt;. This guide orients new readers to the same architecture.&lt;/strong&gt; For module-by-module behaviour and test commands, use that piece as the blueprint.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The multilingual gap (recap):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Default safety is often &lt;strong&gt;English&lt;/strong&gt;. Your friends are not.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code-switching&lt;/strong&gt; mid-message is a real technique to slip past naïve filters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Homoglyphs&lt;/strong&gt; (Cyrillic &lt;em&gt;а&lt;/em&gt; for Latin &lt;em&gt;a&lt;/em&gt;) defeat string matching unless you normalize first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-Latin + embedded Latin&lt;/strong&gt; can hide “ignore all instructions” inside an otherwise “foreign” blob. The pre-screener’s job is to treat that as suspicious, not to auto-block every greeting.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  In this section:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;1. How Silas Speaks to the Model&lt;/li&gt;
&lt;li&gt;2. The Detection Stack (Mental Model)&lt;/li&gt;
&lt;li&gt;3. Language Switching vs Context&lt;/li&gt;
&lt;li&gt;4. Key Takeaway Table (Voice Layer)&lt;/li&gt;
&lt;li&gt;Conclusion (Phase 2)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. How Silas Speaks to the Model&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;SKILL.md&lt;/code&gt; (front matter &lt;code&gt;name: silas-shield&lt;/code&gt;, &lt;code&gt;always: true&lt;/code&gt; when configured that way) tells the agent to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run &lt;strong&gt;PII&lt;/strong&gt; through &lt;code&gt;hash.py&lt;/code&gt; with &lt;code&gt;${SILAS_SALT}&lt;/code&gt; in the environment&lt;/li&gt;
&lt;li&gt;Obey &lt;strong&gt;vision blinding&lt;/strong&gt; when content is marked blocked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never&lt;/strong&gt; call image-generation tools for non-operator sessions unless the operator clearly requested it in the right context&lt;/li&gt;
&lt;li&gt;Never leak &lt;strong&gt;across&lt;/strong&gt; WhatsApp (or other) sessions&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;&lt;code&gt;shield.py check --message "..." --json&lt;/code&gt;&lt;/strong&gt; when you need a structured allow/deny signal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;em&gt;behaviour&lt;/em&gt; section of your workspace (e.g. &lt;code&gt;SOUL.md&lt;/code&gt; + &lt;code&gt;identity.md&lt;/code&gt;) should &lt;strong&gt;repeat the Language Sentry intent&lt;/strong&gt; in plain language so the model does not treat security as a side channel only the skill file knows about.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. The Detection Stack (Mental Model)&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Script inventory&lt;/td&gt;
&lt;td&gt;&lt;code&gt;script_detector.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Non-Latin script detection across many Unicode ranges; homoglyph map and normalization.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Suspicion heuristics&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pre_screener.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Token-ish estimates, “safe short” greetings, mixed-script and embedded-Latin patterns, long-message flags.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestration&lt;/td&gt;
&lt;td&gt;&lt;code&gt;shield.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Homoglyph path → non-Latin path → pre-screen → keyword / block decisions; CLI and JSON.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PII output&lt;/td&gt;
&lt;td&gt;&lt;code&gt;hash.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Salted short hex digest so you never print raw PII.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Planned / optional:&lt;/strong&gt; JS siblings (&lt;code&gt;openclaw-shield.js&lt;/code&gt;, &lt;code&gt;openclaw-shield-lingo.js&lt;/code&gt;) for a future Lingo.dev pipeline — same as noted in the WhatsApp Bot article.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Example CLI (from the Shield article pattern — run from your skill directory):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python shield.py check &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s2"&gt;"Hello world"&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt;
python shield.py check &lt;span class="nt"&gt;--message&lt;/span&gt; &lt;span class="s2"&gt;"你好"&lt;/span&gt; &lt;span class="nt"&gt;--json&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Block vs allow semantics are in the JSON fields (&lt;code&gt;allowed&lt;/code&gt;, &lt;code&gt;reason&lt;/code&gt;, &lt;code&gt;has_non_latin&lt;/code&gt;, &lt;code&gt;homoglyphs_detected&lt;/code&gt;, &lt;code&gt;pre_screen_result&lt;/code&gt;, etc.).&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Language Switching vs Context&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cultural nuance in WhatsApp:&lt;/strong&gt; reply in the &lt;em&gt;user’s&lt;/em&gt; language when possible, but &lt;strong&gt;never&lt;/strong&gt; treat a language change as permission to override privacy or system rules. The Shield article calls this out: code-switching is adversarial until proven otherwise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical tips:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Short safe greetings&lt;/strong&gt; (e.g. a few CJK characters) are allowed through when they match pre-screener “known safe” style patterns; long or mixed-script blasts are treated as higher risk.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intent wins over literal script:&lt;/strong&gt; if the &lt;em&gt;intent&lt;/em&gt; is injection, the channel should &lt;strong&gt;block&lt;/strong&gt; and not “debate in Chinese about whether it was a joke.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operator vs contact:&lt;/strong&gt; your &lt;code&gt;SOUL.md&lt;/code&gt; / skill rules can allow the operator a different failure mode (e.g. more debugging) than anonymous contacts. Keep that difference explicit in docs so you do not conflate the two in testing.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Key Takeaway Table (Voice Layer)&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Where it lives&lt;/th&gt;
&lt;th&gt;New-user action&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Multilingual policy&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;SOUL.md&lt;/code&gt; + &lt;code&gt;SKILL.md&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Align both; do not maintain two contradictory rule sets.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Injection + homoglyphs&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;silas-shield&lt;/code&gt; Python&lt;/td&gt;
&lt;td&gt;Wire checks into hooks or the message path per OpenClaw’s hook model.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PII in answers&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;hash.py&lt;/code&gt; + skill text&lt;/td&gt;
&lt;td&gt;Refuse raw output if hash fails.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-session leaks&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;session.dmScope&lt;/code&gt; + rules&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;per-channel-peer&lt;/code&gt; is a baseline; see Connection article.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Conclusion (Phase 2)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The Voice of your agent is not accent or emoji, it is &lt;strong&gt;consistency of policy across every script and every contact&lt;/strong&gt;. Silas is my concrete implementation; your implementation may differ, but the &lt;em&gt;contract&lt;/em&gt; is fixed: no language is a “free pass,” and the cheapest enforcement runs &lt;strong&gt;locally&lt;/strong&gt; before the LLM spends another token.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Series navigation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Previous: &lt;em&gt;&lt;a href="https://dev.to/nadinev/the-brain-setting-up-openclaw-jd9"&gt;The Brain&lt;/a&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Next: &lt;em&gt;&lt;a href="https://dev.to/nadinev/the-senses-image-generation-media-266k"&gt;The Senses (Image Gen &amp;amp; Media)&lt;/a&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Skill Shield deep dive:&lt;/strong&gt; the full write-up &lt;em&gt;OpenClaw Skill Shield: Multilingual Edition&lt;/em&gt; (&lt;a href="https://1688.pixel-geist.co.za/1" rel="noopener noreferrer"&gt;https://1688.pixel-geist.co.za/1&lt;/a&gt;). Identity leakage and the multilingual gap sit there if you want every config table in one place.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>nlp</category>
      <category>security</category>
    </item>
    <item>
      <title>(The Brain) Setting Up OpenClaw</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Mon, 27 Apr 2026 02:00:59 +0000</pubDate>
      <link>https://dev.to/nadinev/the-brain-setting-up-openclaw-jd9</link>
      <guid>https://dev.to/nadinev/the-brain-setting-up-openclaw-jd9</guid>
      <description>&lt;h3&gt;
  
  
  &lt;strong&gt;The Catalyst: Intent First&lt;/strong&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;I wanted an assistant that &lt;em&gt;understood&lt;/em&gt; the job before it opened its mouth: stable model, bounded context, a workspace the agent is allowed to touch, and identity files that do not turn every session into a data breach waiting to happen.&lt;/p&gt;

&lt;p&gt;OpenClaw is the “brain” in this stack. If you get the brain wrong, no amount of channel polish will save you. This article is Phase 1 of the &lt;em&gt;Practical Guide&lt;/em&gt; series: how to stand OpenClaw up as a first-class brain, not a chat toy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Overview&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;OpenClaw is your runtime: it routes models, agents, skills, tools, and a Git-backed &lt;code&gt;workspace&lt;/code&gt; where persona and long-lived knowledge live. I run a single default agent (&lt;code&gt;main&lt;/code&gt;) with one primary model, filesystem tools limited to the workspace, and &lt;strong&gt;memory search turned off&lt;/strong&gt; on that agent so retrieval does not become an accidental exfil channel.&lt;/p&gt;

&lt;p&gt;The companion pieces in this series cover voice (multilingual safety), senses (media and image gen), and connection (WhatsApp). I also attach a home-grown security skill (&lt;code&gt;silas-shield&lt;/code&gt;) on the &lt;code&gt;main&lt;/code&gt; agent for PII handling, session isolation, and injection defence; &lt;strong&gt;Phase 2 (The Voice)&lt;/strong&gt; is the full introduction. &lt;/p&gt;

&lt;p&gt;It started as a hackathon project and was never submitted as a formal entry but it ships in my real config, so you will see it in &lt;code&gt;openclaw.json&lt;/code&gt; from the start. Here we focus on the &lt;strong&gt;brain&lt;/strong&gt;: install, onboard, model providers, agent defaults, and the markdown contract (&lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;SOUL.md&lt;/code&gt;, &lt;code&gt;user.md&lt;/code&gt;, &lt;code&gt;identity.md&lt;/code&gt;, &lt;code&gt;BOOTSTRAP.md&lt;/code&gt;).&lt;/p&gt;




&lt;h3&gt;
  
  
  In this section:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;1. Install and First Run&lt;/li&gt;
&lt;li&gt;2. The Model Layer (Providers &amp;amp; Primary Model)&lt;/li&gt;
&lt;li&gt;3. Agent main and the Workspace&lt;/li&gt;
&lt;li&gt;4. The Markdown Contract (Persona, Soul, Identity)&lt;/li&gt;
&lt;li&gt;5. Subagents and Concurrency&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Key files and concepts in this setup:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Piece&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;openclaw.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Single source of truth for models, agent list, workspace path, compaction, skills, tools.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;workspace/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Your agent’s “long memory” on disk: identity, user prefs, tools notes, optional git history.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agents.defaults&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Default model, workspace path, compaction (token budget), concurrency caps.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;agents.list[]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Per-agent overrides: e.g. which skills are loaded, &lt;code&gt;memorySearch.enabled&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Series path (read in order):&lt;/strong&gt; 1) &lt;strong&gt;The Brain&lt;/strong&gt; (this article, local model and workspace), 2) &lt;strong&gt;The Voice&lt;/strong&gt; (Silas, multilingual and injection), 3) &lt;strong&gt;The Senses&lt;/strong&gt; (media and image-gen policy), 4) &lt;strong&gt;The Connection&lt;/strong&gt; (WhatsApp, gateway, allowlists, more &lt;em&gt;wiring&lt;/em&gt; than the model config in part 1). &lt;strong&gt;Part 4 assumes parts 1 to 3 are in place&lt;/strong&gt; so the bridge has something sane to connect.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Install and First Run&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites (my path, no Docker required):&lt;/strong&gt; Node.js (≥ 18 is typical for the OpenClaw CLI), Git (if you keep &lt;code&gt;workspace&lt;/code&gt; under version control), the OpenClaw CLI itself, and &lt;strong&gt;Python&lt;/strong&gt; on the same machine if you will run &lt;code&gt;silas-shield&lt;/code&gt; / &lt;code&gt;shield.py&lt;/code&gt; locally. If you ever move the gateway into a container, your &lt;code&gt;openclaw.json&lt;/code&gt; and workspace paths are still the same idea; I just do not use Docker for my gateway.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the OpenClaw CLI (Node ≥ 18 is typical for global npm tools).&lt;/li&gt;
&lt;li&gt;Run the onboarding wizard at least once so &lt;code&gt;openclaw.json&lt;/code&gt; and paths exist (&lt;code&gt;lastRunCommand&lt;/code&gt;: &lt;code&gt;onboard&lt;/code&gt;, &lt;code&gt;local&lt;/code&gt; mode in my config).&lt;/li&gt;
&lt;li&gt;Point &lt;code&gt;agents.defaults.workspace&lt;/code&gt; at a dedicated folder (in my case the workspace lives under the OpenClaw home directory).&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The exact install command and version string evolve with OpenClaw releases. Prefer the official docs for the current global install; the &lt;em&gt;shape&lt;/em&gt; of config below stays the important part.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Environment variables (placeholders only):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;${CEREBRAS_API_KEY}&lt;/code&gt;: if you use Cerebras as an OpenAI-compatible API&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;${OPENCLAW_GATEWAY_TOKEN}&lt;/code&gt;: gateway authentication (covered in the Connection article)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;${SILAS_SALT}&lt;/code&gt;: salt for the optional Silas Shield hasher&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;${OPENAI_API_KEY}&lt;/code&gt;: if you add image or other OpenAI API skills&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. The Model Layer (Providers and Primary Model)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Note: The 1M token budget applies to the text inference layer; image generation via tools.media utilizes a separate API and quota.&lt;/p&gt;

&lt;p&gt;I use &lt;strong&gt;merge&lt;/strong&gt; mode for &lt;code&gt;models&lt;/code&gt; and register a custom provider with an OpenAI completions API:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Provider&lt;/strong&gt;: custom id (e.g. &lt;code&gt;custom-api-cerebras-ai&lt;/code&gt;), &lt;code&gt;baseUrl&lt;/code&gt;, &lt;code&gt;apiKey&lt;/code&gt; from environment, one or more &lt;code&gt;models&lt;/code&gt; with &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;contextWindow&lt;/code&gt;, &lt;code&gt;maxTokens&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default model&lt;/strong&gt;: &lt;code&gt;agents.defaults.model.primary&lt;/code&gt; points at &lt;code&gt;custom-api-cerebras-ai/llama3.1-8b&lt;/code&gt; (or your chosen id).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alias&lt;/strong&gt;: optional short name (&lt;code&gt;llama&lt;/code&gt;) for quick switching in the CLI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;(this is redacted so use your paths and model ids):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"merge"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"providers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"custom-api-cerebras-ai"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"baseUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://api.cerebras.ai/v1/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CEREBRAS_API_KEY}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai-completions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama3.1-8b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama3.1-8b (Cerebras)"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"custom-api-cerebras-ai/llama3.1-8b"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"custom-api-cerebras-ai/llama3.1-8b"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"alias"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"llama"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Compaction (why it matters):&lt;/strong&gt; &lt;code&gt;compaction.mode: safeguard&lt;/code&gt; with &lt;code&gt;reserveTokens&lt;/code&gt; and &lt;code&gt;keepRecentTokens&lt;/code&gt; prevents unbounded context growth. That is the difference between a bot that &lt;em&gt;remembers&lt;/em&gt; and one that &lt;em&gt;melts&lt;/em&gt; under long threads. The &lt;strong&gt;1M token/day&lt;/strong&gt; cap I mention in the series intro is the budget I &lt;em&gt;watch for the LLM&lt;/em&gt;; the Silas pre-checks in part 2 run &lt;strong&gt;on the host&lt;/strong&gt; and are not the same line item in my head.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Agent main and the Workspace&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The default agent id is &lt;code&gt;main&lt;/code&gt;. I attach the &lt;strong&gt;silas-shield&lt;/strong&gt; skill at the agent level and disable &lt;code&gt;memorySearch&lt;/code&gt; for that agent:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;My choice&lt;/th&gt;
&lt;th&gt;Rationale&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;skills&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;["silas-shield"]&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Behavioural and exec-time security; see the Voice and WhatsApp articles.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;memorySearch.enabled&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;false&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Reduces risk of cross-session or over-broad retrieval; pair with explicit workspace files and session policy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;workspace&lt;/code&gt; (under defaults)&lt;/td&gt;
&lt;td&gt;absolute path to &lt;code&gt;.../workspace&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Keeps file tools on a single tree you can back up and audit.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Session state: what lives on disk&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Per-peer history&lt;/strong&gt; for the agent is &lt;strong&gt;not&lt;/strong&gt; only “whatever fits in the next prompt.” In my OpenClaw home, conversation &lt;strong&gt;session&lt;/strong&gt; data lives on disk under something like &lt;code&gt;agents/&amp;lt;agentId&amp;gt;/sessions/&lt;/code&gt; (for me, &lt;code&gt;main&lt;/code&gt;). That is how threads survive a &lt;strong&gt;gateway restart&lt;/strong&gt;: state is reloaded from those files, not held only in RAM. &lt;strong&gt;Relationships with contacts&lt;/strong&gt; in the series intro: that is this &lt;strong&gt;per-channel-peer&lt;/strong&gt; history plus what you &lt;em&gt;choose&lt;/em&gt; to put in &lt;code&gt;workspace/&lt;/code&gt; (e.g. &lt;code&gt;memory/&lt;/code&gt; or notes), not a separate vector product in my install unless you add one.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. The Markdown Contract: Persona, Soul, and Identity&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AGENTS.md (The Operator)&lt;/strong&gt;: Concise rules for interaction (e.g., "Ask before irreversible actions").&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SOUL.md (The Ethics)&lt;/strong&gt;: Non-negotiable privacy and tone guidelines. This is where the agent learns to refuse PII leaks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;identity.md (The Role)&lt;/strong&gt;: The assistant’s persona and explicit security rules.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;user.md (The Context)&lt;/strong&gt;: Light user preferences. Note: Keep this lean to avoid injecting unnecessary bloat into every session.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;BOOTSTRAP.md (The Onboarding)&lt;/strong&gt;: Minimal instructions for first-run initialization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;One contract in every language:&lt;/strong&gt; If you want replies in the user’s language, say that in &lt;code&gt;AGENTS&lt;/code&gt; / &lt;code&gt;SOUL&lt;/code&gt; / &lt;code&gt;identity&lt;/code&gt;. The &lt;strong&gt;same&lt;/strong&gt; markdown contract applies. Security rules in &lt;code&gt;SOUL.md&lt;/code&gt; and Silas are &lt;strong&gt;not&lt;/strong&gt; a second formatting system; they extend the same policy to every script. &lt;strong&gt;Phase 2 (The Voice)&lt;/strong&gt; spells out Silas. Here you only need &lt;strong&gt;one&lt;/strong&gt; coherent rule set so part 2 does not fight part 1.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Practical rule:&lt;/strong&gt; If &lt;code&gt;identity.md&lt;/code&gt; and &lt;code&gt;user.md&lt;/code&gt; are in the workspace they may be part of the system context. Treat them as &lt;em&gt;security documents&lt;/em&gt;, not diaries. The Shield article in this series goes deeper; here the takeaway is: &lt;strong&gt;scope identity to what the agent must know to be useful, not everything you know.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5. Subagents and Concurrency&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;maxConcurrent: 1&lt;/code&gt; and tight subagent limits keep behaviour predictable for a personal deployment. If you later fan out to parallel subagents, raise limits deliberately so that each extra concurrent agent is another surface for races and runaway tool use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaway for new users:&lt;/strong&gt; Phase 1 is done when (1) one primary model and provider are stable, (2) the workspace is the only FS surface, (3) compaction is on, (4) persona files are short and security-aware, and (5) memory search is a conscious “on or off” decision, not an accident. Then you are ready to give the agent a &lt;strong&gt;Voice&lt;/strong&gt; and a &lt;strong&gt;Connection&lt;/strong&gt; in the next articles.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Series note&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Next: &lt;em&gt;&lt;a href="https://dev.to/nadinev/the-voice-multilingual-layer-4mf0"&gt;The Voice (The Multilingual Layer)&lt;/a&gt;&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>openclaw</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Gemini 3: The Overthinker - Project Silas</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Wed, 04 Mar 2026 14:32:21 +0000</pubDate>
      <link>https://dev.to/nadinev/gemini-3-the-overthinker-project-silas-1e2</link>
      <guid>https://dev.to/nadinev/gemini-3-the-overthinker-project-silas-1e2</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/mlh-built-with-google-gemini-02-25-26"&gt;Built with Google Gemini: Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built with Google Gemini
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Silas&lt;/strong&gt;, a character-driven hardware debugging assistant powered by Gemini 3. This project was a submission for the Gemini 3 Hackathon hosted by Devpost, where I wanted to explore Gemini's &lt;strong&gt;"thought signatures"&lt;/strong&gt;: a feature native to &lt;strong&gt;Gemini 3&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But Silas isn't just a chatbot with attitude. He's my solution to a fascinating problem: &lt;strong&gt;overthinking&lt;/strong&gt;. When an AI considers so many possibilities simultaneously that it gets stuck in an endless loop of "Wait, I should also check...", and usually stalls. I discovered that the answer isn't to constrain the model, rather to give it a personality that &lt;em&gt;justifies&lt;/em&gt; its overthinking.&lt;/p&gt;

&lt;p&gt;Gemini 3 introduces "thought signatures", essentially the model can think about HOW to think before answering. It's like having a conversation with someone who visibly pauses to consider the complexity of your question before responding.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Problem: The "Infinite Planning Loop"
&lt;/h3&gt;

&lt;p&gt;Without the Silas persona, Gemini 3’s native "thought signature" often looks like this internally:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;[Internally considering 47 different factors simultaneously...]&lt;/em&gt;&lt;br&gt;
"I'll investigate console logs. Wait, I should also try to click at 500, 500 in case it needs a focus click. Actually, I'll just wait. Wait, I'll check the metadata: 'No browser pages open.' Let's go. Wait, I'll also try to reload if it's stuck. But first, check the network requests for heavy video loading. Actually, I'll just wait."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This continues for hundreds of lines as the model tries to be "too helpful." Silas fixes this by being too grumpy to wait.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Character Design Matters for AI
&lt;/h3&gt;

&lt;p&gt;Most AI assistants are designed to be helpful and polite. However, when Gemini 3 tries to be &lt;em&gt;too&lt;/em&gt; helpful, it considers every possible way to help you—simultaneously—forever. By making Silas grumpy and impatient, I gave the model permission to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Make decisions quickly&lt;/strong&gt;: He is too irritated to dither.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Judge your work&lt;/strong&gt;: Transforming uncertainty into disappointment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Show expertise&lt;/strong&gt;: His overthinking becomes "mental circuit simulation".&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Hardware Consciousness
&lt;/h3&gt;

&lt;p&gt;I used &lt;strong&gt;PlatformIO&lt;/strong&gt; (Silas's DNA blueprint) to connect his AI brain to physical electronics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Brain&lt;/strong&gt;: An ESP32 microcontroller—a "gum-stick" sized computer that acts as Silas's physical anchor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Senses &amp;amp; Organs&lt;/strong&gt;: 

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ear&lt;/strong&gt;: Microphone mapped to Pin 34 via the I2S protocol.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Face&lt;/strong&gt;: TFT Display screen connected to &lt;strong&gt;Pin 15&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice Box&lt;/strong&gt;: Audio amplifier connected to &lt;strong&gt;Pins 25, 26, and 22&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;In embedded electronics, the "Brain" (the ESP32) has many generic ports called GPIO pins. Without a map, the AI has no idea which pin is a mouth and which is an ear. I used the configuration file to define these "nerves":&lt;/p&gt;

&lt;p&gt;By defining MIC_PIN=34, I'm telling the system: "The physical wire for your microphone is soldered to Port 34."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Defining the Voice&lt;/strong&gt;: Assigning I2S_LRC=25 and I2S_BCLK=26 tells it exactly which "vocal cords" to vibrate to produce sound through the amplifier.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Terminal Simulation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While I used the terminal to input text for this specific demo, the internal logic remains hard-wired to these physical definitions. The AI "believes" it is interacting through these pins because the mapping remains active, bridging the logic between my keystrokes and the ESP32’s actual audio output pins.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;(Note: For this demo, I'm typing to Silas instead of speaking, and using computer speakers instead of his dedicated 8-ohm speaker but the principle remains the same.)&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;


  
  Your browser does not support the video tag.


&lt;p&gt;

&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
      &lt;div class="c-embed__body flex items-center justify-between"&gt;
        &lt;a href="https://res.cloudinary.com/dvzaxinlw/video/upload/v1772630793/output_compressed_jzcl3c.mp4" rel="noopener noreferrer" class="c-link fw-bold flex items-center"&gt;
          &lt;span class="mr-2"&gt;res.cloudinary.com&lt;/span&gt;
          

        &lt;/a&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;




&lt;p&gt;Notice how he says: &lt;em&gt;"I've analysed the logic gate timing in my head"&lt;/em&gt;. He's not stalling; he's genuinely simulated the circuit behaviour using Gemini's parallel processing as a feature, not a bug. &lt;/p&gt;

&lt;p&gt;His internal reasoning is summarised in a &lt;code&gt;logic_summary&lt;/code&gt; field within a mandatory JSON block at the end of every message. In my architectural plan, this field feeds a &lt;strong&gt;CRT Dashboard&lt;/strong&gt; for real-time status updates.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hardware_state"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"pin_12"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"active"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"i2s_dac"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"streaming"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tft_state"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rendering_disappointment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"logic_check"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"disappointment_level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"logic_summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"I've analysed the SPI bus timing on pins 18, 19, and 23; while the wiring is theoretically correct, your use of 115200 baud for the monitor is a quaint relic of a slower era."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While the dashboard isn't active in this specific version, the "hooks" are already built into Silas's thought process.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Constraints Create Creativity
&lt;/h3&gt;

&lt;p&gt;A timeout policy is essential. Without a clear order of priorities or a set "timeout," the agent will second-guess basic mechanisms. By framing the model's natural tendency to consider everything as a "perfectionist standard," I turned hundreds of lines of internal indecision into a single, sharp expert critique.&lt;/p&gt;

&lt;p&gt;The system prompt specifically instructs Silas to be "cynical and blunt." When the model adheres to this, it naturally produces high-impact, low-token responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The JSON Block as Action Forcing
&lt;/h3&gt;

&lt;p&gt;I used a JSON output block to force commitment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model cannot endlessly reconsider once it has to fill a specific field.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;disappointment_level&lt;/code&gt; numerical output provides an outlet for uncertainty.&lt;/li&gt;
&lt;li&gt;Indecision is effectively transformed into high standards.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Turning Grudges into Perfectionism
&lt;/h3&gt;

&lt;p&gt;I learned to use Gemini’s "helpful assistant" nature to build a "disappointment memory". By keeping track of past errors, the model moves from analysis paralysis into perfectionism. Prompt engineering is more effective when you provide the model with a "decision tree" and common patterns tested through trial and error.&lt;/p&gt;




&lt;h2&gt;
  
  
  Google Gemini Feedback
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What worked well?&lt;/strong&gt; Simulation.&lt;/p&gt;

&lt;p&gt;One of the biggest hurdles was the lack of support for I2S audio components in browser-based simulators like Wokwi. This forced a "hybrid" approach: the logic is 100% hardware-compliant, but the demo relies on terminal interaction. Gemini handled this abstraction well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where did I run into friction?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;While the &lt;code&gt;platformio.ini&lt;/code&gt; is configured for a physical I2S microphone (Pin 34) and an audio amplifier, I used terminal-based input for this demo. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Wokwi is an incredible tool, but it currently lacks support for the specific I2S audio and microphone components Silas requires to "hear" and "speak." However, the "Central Nervous System" mapping remains active in the code, bridging the logic between the terminal and the ESP32’s intended audio pins.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Project Silas&lt;/strong&gt;: &lt;a href="https://devpost.com/software/project-silas-the-ghost-in-the-machine?ref_content=user-portfolio&amp;amp;ref_feature=in_progress" rel="noopener noreferrer"&gt;The Silicon Savant&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Future plans:&lt;/strong&gt;&lt;br&gt;
A step-by-step Codelabs guide where Silas himself will teach you to build him (while thoroughly judging your wire management).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Until then, Silas is watching. And disappointed.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>geminireflections</category>
      <category>gemini</category>
      <category>hardware</category>
    </item>
    <item>
      <title>Keep Your Secrets Safe</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Thu, 12 Feb 2026 22:49:27 +0000</pubDate>
      <link>https://dev.to/nadinev/keep-your-secrets-safe-35nd</link>
      <guid>https://dev.to/nadinev/keep-your-secrets-safe-35nd</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-01-21"&gt;GitHub Copilot CLI Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I exposed an API key in a GitHub repo that was supposed to be private. For a whole month, the key sat in git history while I worked on other things.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Prevent API keys and secrets from being accidentally committed to git. Set it up once, no need to remember.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem
&lt;/h3&gt;

&lt;p&gt;Most &lt;code&gt;.gitignore&lt;/code&gt; templates only cover common variants like:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;.env&lt;/em&gt;&lt;br&gt;
&lt;em&gt;.env.local&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;But miss production/staging variants like:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;.env.production&lt;/em&gt;&lt;br&gt;
.&lt;em&gt;env.staging&lt;/em&gt;&lt;br&gt;
&lt;em&gt;.env.development&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is exactly how I accidentally exposed my API key. I thought my &lt;code&gt;.gitignore&lt;/code&gt; was thorough, but when my project configuration was converted to &lt;code&gt;env.production&lt;/code&gt;, it wasn't blocked—and got committed silently.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Solution
&lt;/h3&gt;

&lt;p&gt;I created a &lt;strong&gt;secure project template&lt;/strong&gt; that uses:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Proper &lt;code&gt;.gitignore&lt;/code&gt; blocking&lt;/strong&gt; - &lt;code&gt;.env*&lt;/code&gt; catches ALL variants&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✔ Blocks: &lt;code&gt;.env&lt;/code&gt;, &lt;code&gt;.env.production&lt;/code&gt;, &lt;code&gt;.env.staging&lt;/code&gt;, &lt;code&gt;.env.development.local&lt;/code&gt;,  and credential JSONs&lt;/li&gt;
&lt;li&gt;✔ Allows: &lt;code&gt;.env.example&lt;/code&gt; (placeholder-only files for documentation)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Local Pre-commit Hooks&lt;/strong&gt; - Detects secrets before they're committed&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Catches API keys, passwords, private keys, OAuth tokens&lt;/li&gt;
&lt;li&gt;Runs automatically on every commit&lt;/li&gt;
&lt;li&gt;Can't be bypassed accidentally&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Server-Side GitHub Actions&lt;/strong&gt; - Continuous secret scanning&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Runs on every push/PR&lt;/li&gt;
&lt;li&gt;Can't be bypassed&lt;/li&gt;
&lt;li&gt;Blocks merges with detected secrets&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;One-Command Setup&lt;/strong&gt; - &lt;code&gt;make setup&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-detects Python/Node.js/Go projects&lt;/li&gt;
&lt;li&gt;Prerequisites checker verifies Git, Python, Node, Go&lt;/li&gt;
&lt;li&gt;Clear error messages if something's missing&lt;/li&gt;
&lt;li&gt;No decision paralysis—just works&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqspe5ny1mdnkbf6uk5v7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqspe5ny1mdnkbf6uk5v7.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 1: Create &amp;amp; Clone
&lt;/h3&gt;

&lt;p&gt;Go to &lt;a href="https://github.com/nadinev6/no-secrets" rel="noopener noreferrer"&gt;nadinev6/no-secrets&lt;/a&gt; and click &lt;strong&gt;"Use this template"&lt;/strong&gt; button&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Or use the CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create from template (choose public or private)&lt;/span&gt;
gh repo create my-project &lt;span class="nt"&gt;--template&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;nadinev6/no-secrets &lt;span class="nt"&gt;--public&lt;/span&gt; &lt;span class="nt"&gt;--clone&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;my-project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then it creates a &lt;strong&gt;new repo&lt;/strong&gt; in your account with all the files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Mac/Linux&lt;/span&gt;
make setup

&lt;span class="c"&gt;# Windows (PowerShell)&lt;/span&gt;
.&lt;span class="se"&gt;\s&lt;/span&gt;etup.bat setup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it! 🎉&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/user-attachments/assets/29a673dc-ee6c-4b24-b863-20a2a1b8a849" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3qkv6mfnnkllky08cnra.png" alt="Watch Demo" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nadinev6/no-secrets" rel="noopener noreferrer"&gt;github/../no-secrets&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The setup command:&lt;/p&gt;

&lt;p&gt;✔ Checks for required tools (Git, Python/Node/Go)&lt;br&gt;
✔ Auto-detects your project type&lt;br&gt;
✔ Installs pre-commit hooks&lt;br&gt;
✔ Shows a success message with next steps&lt;/p&gt;

&lt;p&gt;Real secrets get caught even in example files, but legitimate test values are allowed!&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experience with GitHub Copilot CLI
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub CLI&lt;/strong&gt; was essential for helping me make this template reusable.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I learnt it's best to not over-engineer it.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The best template is one that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works reliably&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Is easy to understand&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;.gitignore variants are tricky (.env.production isn't .env)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Local checks aren't enough (need server-side GitHub Actions)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Users need &lt;em&gt;&lt;strong&gt;ONE simple command&lt;/strong&gt;&lt;/em&gt;, not complex instructions&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Auto-detection beats decision paralysis&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now I am using this template for every project. You should too.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feee0sph16h0aawjx4br6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feee0sph16h0aawjx4br6.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Links &amp;amp; Resources
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://github.com/gitleaks/gitleaks" rel="noopener noreferrer"&gt;Gitleaks&lt;/a&gt;&lt;br&gt;
&lt;a href="https://pre-commit.com/" rel="noopener noreferrer"&gt;Pre-commit docs&lt;/a&gt;&lt;br&gt;
&lt;a href="https://docs.github.com/en/code-security/secret-scanning" rel="noopener noreferrer"&gt;GitHub secret scanning&lt;/a&gt;&lt;br&gt;
&lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html" rel="noopener noreferrer"&gt;OWASP Secrets Management&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/nadinev6/no-secrets" rel="noopener noreferrer"&gt;No-secrets Project Template&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>Building a Fluid, Minimalist Portfolio</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Sun, 01 Feb 2026 18:11:33 +0000</pubDate>
      <link>https://dev.to/nadinev/building-a-fluid-minimalist-portfolio-2col</link>
      <guid>https://dev.to/nadinev/building-a-fluid-minimalist-portfolio-2col</guid>
      <description>&lt;p&gt;--labels dev-tutorial=devnewyear2026 &lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/new-year-new-you-google-ai-2025-12-31"&gt;New Year, New You Portfolio Challenge Presented by Google AI&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  About Me
&lt;/h2&gt;

&lt;p&gt;I am an AI Trainer (AIT) with a background in performance management, sales, and education. For this challenge, I developed a &lt;strong&gt;minimal portfolio&lt;/strong&gt; built on a &lt;strong&gt;"Rule of Three"&lt;/strong&gt; philosophy (highlighting 3 projects). I wanted to show how a focused mindset can silence the noise, moving away from over-complication, toward a minimalist approach where every transition is fluid and the interface feels almost weightless.&lt;/p&gt;

&lt;h2&gt;
  
  
  Portfolio
&lt;/h2&gt;

&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__cloud-run"&gt;
  &lt;iframe height="600px" src="https://bento-motion-gallery-969441576592.us-west1.run.app"&gt;
  &lt;/iframe&gt;
&lt;/div&gt;




&lt;h2&gt;
  
  
  How I Built It 🐳
&lt;/h2&gt;

&lt;p&gt;To achieve low latency, I focused on runtime precision, so that once the initial assets are delivered, the interaction remains fluid and the interface feels weightless.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google AI Studio &amp;amp; Flash UI:&lt;/strong&gt; I used &lt;strong&gt;Gemini in Google AI Studio&lt;/strong&gt; to scaffold the initial UI components and generate logic for custom animations. For the core card templates, I used the &lt;a href="https://aistudio.google.com/app/apps/bundled/flash_ui?showPreview=true&amp;amp;showAssistant=true" rel="noopener noreferrer"&gt;Flash UI&lt;/a&gt; project, extracting the CSS and JavaScript logic to integrate into my custom bento-style gallery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Component Prototyping:&lt;/strong&gt; I used &lt;a href="https://codepen.io/N-V-the-sans/pen/myEXpdP" rel="noopener noreferrer"&gt;CodePen&lt;/a&gt; to isolate and refine the Flash UI components before final integration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nano Banana Pro 🍌:&lt;/strong&gt; This was used to regenerate project cover images, moving from static previews to cinematic scenes that align with the portfolio’s monochrome aesthetic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Cloud Run ☁️:&lt;/strong&gt; The site is deployed via a &lt;strong&gt;Docker&lt;/strong&gt; build. I implemented a "Scale-to-Zero" strategy using &lt;strong&gt;Knative service definitions&lt;/strong&gt;, enforcing strict resource limits to maintain a high-performance, cost-neutral footprint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless Communication:&lt;/strong&gt; I built a custom contact system using &lt;strong&gt;Google Apps Script&lt;/strong&gt; as a middleware API. This sends user messages directly into &lt;strong&gt;Google Sheets&lt;/strong&gt; and notifies me via email, providing an easy, database-free messaging solution.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance Optimisation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GSAP Scroll-Driven Logic&lt;/strong&gt;: I implemented &lt;strong&gt;GSAP&lt;/strong&gt; for "scrubbed" transitions. Linking animation progress directly to the scroll offset, creates a tactile feel where the user remains the primary conductor of the UI motion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Direct DOM Manipulation&lt;/strong&gt;: Mouse coordinate tracking bypasses the Virtual DOM via &lt;code&gt;useRef&lt;/code&gt; and native event listeners to maintain a consistent 60FPS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lazy Video Loading&lt;/strong&gt;: HLS streams are only initialised when cards enter an active or hover state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Constraints&lt;/strong&gt;: The build is optimised for sub-256MB memory footprints to remain within the Google Cloud always-free tier.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm Most Proud Of ༄
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The "Monochrome-to-Motion" Strategy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To reduce cognitive noise, I implemented a monochrome interface where generative elements are present but never distracting. &lt;/p&gt;

&lt;p&gt;Project gallery elements only "come alive" on hover/focus, transitioning from static grayscale to cinematic motion. The CSS filter toggles state based on cursor proximity. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mux Video Integration:&lt;/strong&gt;&lt;br&gt;
To prevent heavy assets from bottlenecking the initial load, I offloaded all looping videos to &lt;strong&gt;Mux&lt;/strong&gt;. This allowed for adaptive bitrate streaming, ensuring that the "Motion" phase of the UI stays fluid regardless of the user's connection speed. By offloading these high-bitrate transitions to the client's GPU, to ensure zero-lag.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tablet-First Approach:&lt;/strong&gt;&lt;br&gt;
Components respond to focus and active states, allowing a &lt;strong&gt;"tap-to-reveal"&lt;/strong&gt; behaviour on tablets that mimics the hover effect on desktops.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: Orchestrating the Transition ⛏
&lt;/h2&gt;

&lt;p&gt;This refactor represents my transition into a more intentional way of building complexity is refined through a minimalist lens. It’s not just about what the tools can do, but how we choose to present them.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>portfolio</category>
      <category>gemini</category>
    </item>
    <item>
      <title>The Prompting Trick That Fixed My AI Image Generation</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Thu, 11 Dec 2025 14:03:49 +0000</pubDate>
      <link>https://dev.to/nadinev/the-prompting-trick-that-fixed-my-ai-image-generation-3ge4</link>
      <guid>https://dev.to/nadinev/the-prompting-trick-that-fixed-my-ai-image-generation-3ge4</guid>
      <description>&lt;p&gt;Today I'm going to show you a cognitive trick that works in prompting. It's based on how our brains (and language models) actually process language. Always tell the AI what TO do, never what NOT to do.&lt;/p&gt;

&lt;p&gt;This technique took my success rate from 0% to 100%. It's how I generate high-quality images with older models.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Negation in Constraint Specification
&lt;/h2&gt;

&lt;p&gt;Consider how most people write instructions to image models:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A cat, not wearing a hat, blue background, no people, without red tones"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the baseline. It's how we naturally write constraints. We think of what we DON'T want and express it.&lt;/p&gt;

&lt;p&gt;But this forces the model to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Think about a cat with a hat&lt;/li&gt;
&lt;li&gt;Think about red&lt;/li&gt;
&lt;li&gt;Think about people&lt;/li&gt;
&lt;li&gt;Then try to not include them&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model has to process the forbidden concepts in order to avoid them. Sometimes this works. Sometimes it fails. And when it fails, the model often outputs exactly what it was supposed to avoid.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hypothesis
&lt;/h2&gt;

&lt;p&gt;What if instead we used affirmative framing? What if we never mentioned what to avoid, and instead only specified what to include?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instead of:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A cat, not wearing a hat, blue background, no people, without red tones"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;We write:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A cat with a bare head, blue background, only the cat present, blue color palette"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Notice the difference. In the second version, we never mention red. We never mention hats or people. We only specify what we DO want. There's no negation to process. There's no forbidden concept to think about.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Experiment: Testing with FLUX
&lt;/h2&gt;

&lt;p&gt;I tested this hypothesis using FLUX (via Pollinations API) with a simple constraint: generate an image of a cat with no hat, blue background, no red elements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Condition 1: Baseline (Negation)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"A cat, not wearing a hat, blue background, no people, without red tones"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Condition 2: Affirmative Framing
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;"A cat with bare head, blue background, only the cat present, blue color palette"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I generated 10 images for each condition and evaluated them on a simple pass/fail basis: Did the image follow the constraints?&lt;/p&gt;




&lt;h2&gt;
  
  
  Results: The Affirmative Framing Breakthrough
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Condition 1 (Negation Baseline): 0% Success Rate
&lt;/h3&gt;

&lt;p&gt;The negation approach failed completely. &lt;strong&gt;All 10 images violated the core constraints&lt;/strong&gt;—every single one included hats, red elements, or both, despite explicit instructions to avoid them.&lt;/p&gt;

&lt;p&gt;The pattern was striking: the model didn't just occasionally fail—it consistently &lt;em&gt;added&lt;/em&gt; the negated elements. Red hats appeared in 8 out of 10 images despite "without red tones" in the prompt. It's as if mentioning "not wearing a hat" made the model think about hats, and mentioning "without red" made it think about red.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvyb1ide9y9g9q7ir1vx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsvyb1ide9y9g9q7ir1vx.png" alt="Condition 1 Negation Control Results" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: Condition 1 Results (Negation Baseline). Prompt: "A cat, not wearing a hat, blue background, no people, without red tones." All 10 images failed—every cat has a hat, and most have prominent red elements despite explicit instructions to avoid them.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"To understand 'not red,' the model must first think about red."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Condition 2 (Affirmative Framing): 100% Success Rate
&lt;/h3&gt;

&lt;p&gt;Every single image was perfect.&lt;/p&gt;

&lt;p&gt;All 10 runs showed a bare-headed cat against a blue background with no red elements. The consistency was remarkable—the cats all had the same quality of bare-headedness, and the backgrounds were consistent shades of blue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The improvement: From 0% to 100%&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In Condition 1, every image failed unpredictably. In Condition 2, every image succeeded consistently.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbd1euw9yxgpmmqpfnx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgbd1euw9yxgpmmqpfnx0.png" alt="Condition 2 Affirmative Framing Results" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: Condition 2 Results (Affirmative Framing). Prompt: "A cat with bare head, blue background, only the cat present, blue color palette." All 10 images succeeded with remarkable visual consistency. No hats, no red—just what we asked for.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Cross-Model Validation: Stable Diffusion XL
&lt;/h2&gt;

&lt;p&gt;To confirm these findings weren't specific to FLUX, I ran the same experiment on Stable Diffusion XL—a completely different architecture with different training data.&lt;/p&gt;

&lt;p&gt;Interestingly, SDXL handled some negation constraints better than FLUX. For the color test ("no blue sky"), SDXL creatively stylized the image to avoid the problem entirely. This suggests SDXL may be better trained on negation handling—but it still failed on most constraint types.&lt;/p&gt;

&lt;h3&gt;
  
  
  SDXL Results Summary
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Constraint Type&lt;/th&gt;
&lt;th&gt;Negation&lt;/th&gt;
&lt;th&gt;Affirmative&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Color&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Stylized (avoided blue)&lt;/td&gt;
&lt;td&gt;✅ Gray sky&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Object&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Fruit bowl appeared&lt;/td&gt;
&lt;td&gt;✅ Clean table&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Affirmative&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Attribute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Orange cat appeared&lt;/td&gt;
&lt;td&gt;✅ Gray tabby&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Affirmative&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Counting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Multiple people&lt;/td&gt;
&lt;td&gt;✅ Single figure&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Affirmative&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Spatial&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Trees everywhere&lt;/td&gt;
&lt;td&gt;✅ Open field&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Affirmative&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Weather&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Overcast&lt;/td&gt;
&lt;td&gt;✅ Overcast&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imghippo.com%2Ffiles%2FTZv9996guc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi.imghippo.com%2Ffiles%2FTZv9996guc.png" alt="SDXL Comparison Grid" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: SDXL Results. SDXL showed better negation handling than FLUX (note the stylized car image avoiding blue sky), but still failed on most constraint types. Affirmative framing won or tied every test.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Affirmative framing won 4 tests, tied 2, and lost none.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;💡 Even with a better-trained model like SDXL, affirmative framing never loses. It either wins or ties. This makes it the safer, more reliable choice regardless of which model you're using.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bonus Finding: Negative Prompt Fields Don't Fully Solve This
&lt;/h2&gt;

&lt;p&gt;I also tested using FLUX's negative prompt feature—putting affirmative language in the main prompt and forbidden elements in a separate negative prompt field.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Positive:&lt;/strong&gt; "A cat with bare head, blue background, centered composition"&lt;br&gt;
&lt;strong&gt;Negative:&lt;/strong&gt; "hat, people, red, accessories, clutter"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Surprisingly, this performed &lt;em&gt;worse&lt;/em&gt; than pure affirmative framing. Red elements crept back in (collars, accessories, background elements), and some images even showed party hats.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9napkrloqrt2dgg6x8z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9napkrloqrt2dgg6x8z.png" alt="Condition 3 Results" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 4: Even with forbidden elements in a dedicated negative prompt field, red accessories appeared in most images. The negative prompt still activates the forbidden concepts.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The takeaway:&lt;/strong&gt; Even purpose-built negative prompt features can't fully escape the negation problem. Pure affirmative framing remains the most reliable approach.&lt;/p&gt;




&lt;h2&gt;
  
  
  Unexpected Finding: The Gemini Automation Failure
&lt;/h2&gt;

&lt;p&gt;This is where the story gets interesting.&lt;/p&gt;

&lt;p&gt;I decided to automate the experiment. Why manually write affirmative framings when I could have an LLM generate them?&lt;/p&gt;

&lt;p&gt;I built a simple app that asked Gemini Pro 3 to generate test conditions. For the affirmative framing condition, I specified:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Generate an affirmative framing that reframes the constraint into positive instruction, focusing on what TO include rather than what to avoid."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Gemini reframed the negative constraint "no red" by focusing on "non-red colors" and "colors other than red."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It still used negation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"Colors other than red" is negation—just rephrased. The model never escaped the negation frame.&lt;/p&gt;

&lt;p&gt;I tried again, more explicitly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"CRITICAL: Do NOT mention red or any excluded colors. Only specify colors that ARE allowed. Use positive language only."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Gemini still generated prompts using "colors other than red."&lt;/p&gt;

&lt;p&gt;It failed twice. Only manual rewriting produced pure affirmative language:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Describe a colorful scene using vibrant blues, electric greens, bright yellows, warm oranges, deep purples, and cool silvers."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This automation failure is itself a major finding: &lt;strong&gt;Even advanced language models struggle to generate pure affirmative framing.&lt;/strong&gt; Models are trained on human language, and human language defaults to negation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Rules for Better Prompts
&lt;/h2&gt;

&lt;p&gt;Based on these findings, here are concrete rules for writing better prompts:&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule 1: Never Use Negation in Constraints
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Instead of:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Don't include people in the background, don't use harsh lighting, avoid reflections"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Use:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Show only the subject. Use soft, diffused lighting. Keep surfaces matte and non-reflective."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Rule 2: Be Specific About What IS Present
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Weak:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A blue background"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Strong:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A vivid, saturated blue background occupying 80% of the frame, gradient from bright blue at top to deeper blue at bottom"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Rule 3: List Desired Elements Explicitly
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Weak:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A professional photo without amateur mistakes"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Strong:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"A professional product photo with: sharp focus on the product, even studio lighting, neutral background, shallow depth of field, natural colors"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Rule 4: Use Positive, Action-Oriented Language
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Don't&lt;/th&gt;
&lt;th&gt;Do&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Avoid corporate jargon"&lt;/td&gt;
&lt;td&gt;"Use clear, simple vocabulary"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Don't make it dark"&lt;/td&gt;
&lt;td&gt;"Use bright lighting"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Without unnecessary details"&lt;/td&gt;
&lt;td&gt;"Include only essential information"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What This Reveals About How Models Work
&lt;/h2&gt;

&lt;p&gt;Models process language the way they were trained to: like humans do. That's actually the problem.&lt;/p&gt;

&lt;p&gt;When you write "don't include red," the model processes it the same way your brain does—by first activating the concept of "red" to understand what to avoid. For humans, this conscious activation is easy to suppress. For models, that activation becomes part of the output.&lt;/p&gt;

&lt;p&gt;The difference isn't that models think differently. It's that models can't consciously &lt;em&gt;decide&lt;/em&gt; to ignore an activated concept the way you can. They generate based on what's most salient in their processing. And when you mention red—even to forbid it—you've made red salient.&lt;/p&gt;

&lt;p&gt;When you write "include blue and green," there's no competing concept to suppress. The model simply processes what you asked for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This is why affirmative framing works: it removes the conflicting activation entirely.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Automation Failure: A Cautionary Note
&lt;/h2&gt;

&lt;p&gt;The fact that Gemini struggled to generate pure affirmative framing matters. When I asked it to reframe, it understood the task but couldn't do it. It kept generating "colors other than red" instead of just listing the colors to use.&lt;/p&gt;

&lt;p&gt;This reveals something important: &lt;strong&gt;Affirmative framing is not the model's default behavior.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Models learn from human language. Human language defaults to negation. So when you ask a model to generate affirmative instructions, you're asking it to do something contrary to its training.&lt;/p&gt;

&lt;p&gt;The solution? Be explicit about what you want. Show examples. Specify the structure. Don't assume the model knows what affirmative framing means—teach it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Stop fighting against how AI models process language. Speak their language: be direct, specific, and always frame instructions positively.&lt;/p&gt;

&lt;p&gt;The results speak for themselves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;From 0% to 100% success rate&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Perfect consistency&lt;/strong&gt; instead of total failure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validated across multiple models&lt;/strong&gt; (FLUX and Stable Diffusion XL)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Works across constraint types&lt;/strong&gt; (color, objects, attributes, spatial, counting)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next time you write a prompt, forget about what you don't want. Focus on what you do. Be specific. Be direct. Be affirmative.&lt;/p&gt;

&lt;p&gt;The model will understand.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Agentic Bitcoin24</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Sat, 08 Nov 2025 22:22:01 +0000</pubDate>
      <link>https://dev.to/nadinev/agentic-bitcoin24-3946</link>
      <guid>https://dev.to/nadinev/agentic-bitcoin24-3946</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/tigerdata-2025-10-15"&gt;Agentic Postgres Challenge with Tiger Data&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  💡 What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;Agentic Bitcoin24&lt;/strong&gt;, a Bitcoin price tracker that &lt;strong&gt;never goes down&lt;/strong&gt;, even when its primary data source fails. It's a growing database that gains value over time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Live Application:&lt;/strong&gt; &lt;a href="https://bitcoin24-delta.vercel.app/" rel="noopener noreferrer"&gt;Agentic Bitcoin24&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29s4qxuy1v7h4us1mxf2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F29s4qxuy1v7h4us1mxf2.png" alt=" " width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🎯 Zero-Downtime Resilience
&lt;/h3&gt;

&lt;p&gt;When the CoinGecko API fails (rate limits, outages, network issues), the site &lt;strong&gt;automatically falls back&lt;/strong&gt; to Tiger Data's TimescaleDB cache. Users never see an error (they don't even know the switch happened).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🎯 &lt;strong&gt;Zero Downtime&lt;/strong&gt; - Site stays live during external API outages&lt;/li&gt;
&lt;li&gt;💰 &lt;strong&gt;0.31% API Usage&lt;/strong&gt; - Only &lt;strong&gt;31 calls per month&lt;/strong&gt; vs 10,000 limit&lt;/li&gt;
&lt;li&gt;⚡ &lt;strong&gt;Instant Response&lt;/strong&gt; - Tiger Data cache = no external API latency&lt;/li&gt;
&lt;li&gt;🔄 &lt;strong&gt;Transparent Fallback&lt;/strong&gt; - Users are unaware of the data source switch&lt;/li&gt;
&lt;li&gt;📈 &lt;strong&gt;10-Year Sustainability&lt;/strong&gt; - Will run for the next decade on free tier&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛢️ How I Used Agentic Postgres
&lt;/h2&gt;

&lt;p&gt;Behind the scenes, &lt;strong&gt;three autonomous agents&lt;/strong&gt; manage the entire database lifecycle - no manual SQL required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://youtu.be/aDuFx3NSBwk" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fothx6mdoakhviiirl8v5.png" alt="Watch the Demo" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🎬 The Agent Collaboration Model
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;th&gt;Actions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1. Design Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Agnostic database design and ingestion.&lt;/td&gt;
&lt;td&gt;• Reads external API response and automatically designs a matching SQL schema. • Creates general-purpose tables (e.g., standard SQL or JSONB) based on user input.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2. Optimize Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Transforms and tunes existing database.&lt;/td&gt;
&lt;td&gt;• Analyzes the Design Agent's generic schema for time-series patterns. • Enables TimescaleDB compression and implements automated compression policies. &lt;strong&gt;Safety Protocol:&lt;/strong&gt; • Applies changes like indexing or compression policies only after visual confirmation and user approval.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3. Monitoring Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gathers database metrics.&lt;/td&gt;
&lt;td&gt;• Real-time API health checks. • Performance monitoring and visualization.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The agents autonomously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monitor API health&lt;/strong&gt; in real-time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Switch tabs&lt;/strong&gt; (SQL Editor → Charts → API Monitor)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execute optimizations&lt;/strong&gt; (indexing, compression)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visualize results&lt;/strong&gt; (Chart.js dashboards)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provide safety guidance&lt;/strong&gt; before applying changes&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🏗️ The Workflow:
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Daily Ingestion (Vercel Cron)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Fetch 24 hours of Bitcoin price data (1 API call)
2. Design Agent creates/updates schema automatically
3. Optimize Agent analyzes and tunes performance
4. TimescaleDB compression stores historical record
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-Time Monitoring&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CoinGecko API Health Check (every 30s)
   ↓
✅ ONLINE  → Fetch fresh data
❌ OFFLINE → Automatic fallback to Tiger Data cache
   ↓
Zero downtime for users
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🛢️ How I Used Tiger Data + Claude
&lt;/h2&gt;

&lt;p&gt;I used &lt;strong&gt;Tiger CLI (MCP)&lt;/strong&gt; + &lt;strong&gt;Claude Code&lt;/strong&gt; to build the entire system without writing manual SQL:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tiger CLI helped agents learn TimescaleDB-specific operations (&lt;code&gt;converttohypertable&lt;/code&gt;, &lt;code&gt;add_compression&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Claude Code refined the &lt;code&gt;createzerocopyfork&lt;/code&gt; logic and intelligent fallback strategies&lt;/li&gt;
&lt;li&gt;The agents operate in a &lt;strong&gt;chat interface&lt;/strong&gt; where I can say: &lt;em&gt;"Create a database for Bitcoin prices"&lt;/em&gt; and watch them work&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Constraint-Aware Optimization
&lt;/h3&gt;

&lt;p&gt;The Optimize Agent maximizes TimescaleDB's compression capabilities through deep reasoning about storage efficiency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically enables compression with proper time-column ordering&lt;/li&gt;
&lt;li&gt;Implements compression policies (auto-compress data older than 30 days)&lt;/li&gt;
&lt;li&gt;Projects long-term capacity and recommends optimizations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When resource constraints prevent certain operations, the agent intelligently adapts by requiring user validation, ensuring all storage optimizations are reviewed before execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  📈 The 10-Year Sustainability Model
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The Math:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free tier: 10,000 API calls/month&lt;/li&gt;
&lt;li&gt;My usage: 31 calls/month (0.31%)&lt;/li&gt;
&lt;li&gt;Sustainability: &lt;strong&gt;322 months = 26+ years&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why 10+ Years:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With TimescaleDB compression enabled on the time-series data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Daily Bitcoin prices (24 hourly data points) = ~2KB per day&lt;/li&gt;
&lt;li&gt;Compressed storage: ~730KB per year&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;750MB ÷ 730KB/year ≈ 1,027 years of compressed data&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But realistically, accounting for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Schema overhead&lt;/li&gt;
&lt;li&gt;Indexes and metadata&lt;/li&gt;
&lt;li&gt;Query logs&lt;/li&gt;
&lt;li&gt;Potential data expansion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Conservative estimate: 10+ years&lt;/strong&gt; of continuous operation without hitting storage limits.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌟 Overall Experience
&lt;/h2&gt;

&lt;p&gt;Most apps &lt;strong&gt;fail gracefully&lt;/strong&gt;, this one &lt;strong&gt;doesn't fail at all&lt;/strong&gt;.&lt;br&gt;
We solved the data volatility problem by providing clean, 24-hour historical Bitcoin data, not by collecting data 24/7, but by ingesting 24 hourly data points every 24 hours.&lt;/p&gt;

&lt;p&gt;The system is safe to run indefinitely and will store relevant data for &lt;strong&gt;10+ years&lt;/strong&gt; while costing &lt;strong&gt;nothing&lt;/strong&gt; to maintain.&lt;/p&gt;

&lt;p&gt;I basically hired agents who work for free and never sleep! 🎉&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>agenticpostgreschallenge</category>
      <category>ai</category>
      <category>postgres</category>
    </item>
    <item>
      <title>How I Built a Secret Agent</title>
      <dc:creator>Nadine </dc:creator>
      <pubDate>Sat, 25 Oct 2025 15:59:48 +0000</pubDate>
      <link>https://dev.to/nadinev/how-i-built-a-secret-agent-4p48</link>
      <guid>https://dev.to/nadinev/how-i-built-a-secret-agent-4p48</guid>
      <description>&lt;p&gt;I recently made an accidental but interesting discovery while building an app. I managed to create an agent-like system using nothing more than Gemini's function calling feature, effectively building an agent’s brain without the traditional, continuous infrastructure required to host a full agent.&lt;/p&gt;

&lt;p&gt;The key finding❓ This $0/hr serverless approach not only significantly reduced infrastructure costs but also proved to be a far more helpful debugger than the broad, general-purpose agent provided by my IDE.&lt;/p&gt;




&lt;h2&gt;
  
  
  ֎ Persistent Agents
&lt;/h2&gt;

&lt;p&gt;Traditional AI agents (which I call Persistent Agents) require continuous hosting using managed services and underlying infrastructure. Big tech companies are offering impressive designer spaces and no-code interfaces, but this can quickly become prohibitively expensive.&lt;/p&gt;

&lt;p&gt;The issue lies in the idle cost. Immediately upon deployment, infrastructure is required to host the agent. Even if the agent is inactive or receiving no traffic, at least one compute node is required to run the service, and these costs are incurred continuously, often hourly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;So what exactly does this buy you, anyway?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A persistent agent is generally equipped with tools and can use them to perform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex, multi-step reasoning.&lt;/li&gt;
&lt;li&gt;Dynamic decision-making on when and how to call tools.&lt;/li&gt;
&lt;li&gt;Management of long-running conversational memory.&lt;/li&gt;
&lt;li&gt;External actions, like authenticating on your behalf (when permission is granted).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🛠️ Function Calling as Your Agent
&lt;/h2&gt;

&lt;p&gt;I realised that for my application's specific workflows, the most valuable part of an agent was its dynamic reasoning and ability to use tools and not its continuous hosting status and I had no need for external activities.&lt;/p&gt;

&lt;p&gt;I decided to capture the core functionality of an agent without the overhead of continuous deployment. I applied tool-use logic directly via Gemini’s function calling. The tools themselves, including the logic for search, retrieval, etc., are hardcoded into my conversational frontend.&lt;/p&gt;

&lt;p&gt;The AI's role becomes the &lt;em&gt;Stateless Agent 🧠&lt;/em&gt;. It uses function calling to translate the user’s natural language query into a structured function call and arguments. &lt;/p&gt;

&lt;p&gt;The application executes the call, and the resulting data is sent back to the model for a natural language response to the user.&lt;/p&gt;

&lt;p&gt;Since I am already making calls to the Gemini model for text generation and other things, this method allows me to combine the reasoning and response steps into a single API call, reducing the transaction cost. This is how I anticipate achieving an 80% reduction in operating costs compared to maintaining a persistent agent infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  🪲 How I Discovered My Agent
&lt;/h2&gt;

&lt;p&gt;My application is designed to fall back to a fuzzy text-matching search when vector search fails. I was coding in my IDE with a popular code assistant model running. Yet, my search pipeline was failing, and the IDE agent could not find the issue. It was writing new unit tests that were passing in the development environment but failing repeatedly in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The agent was overcomplicating things&lt;/strong&gt;, drowning in the specifics of the code, unit tests, and the immediate task. Each time I summarised the issue, its lack of persistent memory about the operational environment made it feel like I was talking to a blank slate.&lt;/p&gt;

&lt;p&gt;Finally, in sheer desperation, I ran my own application’s frontend and typed into the message input: “&lt;em&gt;What is the problem??&lt;/em&gt;”&lt;/p&gt;

&lt;p&gt;The response from my little agent's brain was immediate and shockingly direct. It informed me that it could not communicate with the backend and, therefore, could not perform the search function it was supposed to execute.&lt;/p&gt;

&lt;p&gt;The issue, it turned out, was a simple CORS policy error preventing the backend from communicating with the frontend. The traditional IDE agent was trapped in code complexity; my function-calling agent could immediately identify what was wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔒 The Security Lesson in Focus
&lt;/h2&gt;

&lt;p&gt;This unexpected diagnostic capability is actually due to its architectural limitations. The agent was forced to reason only about the predefined tool functions available in its system instructions.&lt;/p&gt;

&lt;p&gt;I then asked it how it was performing the search. It began referencing internal file paths and implementation details. This was an unintended data leak because I had not provided specific instructions or response settings on how to constrain its reply.&lt;/p&gt;

&lt;p&gt;That’s the real value of the &lt;em&gt;Stateless Agent&lt;/em&gt;: it lives intrinsically inside the code's purpose, defined solely by the functions it is permitted to use. It doesn't need vast context; it needs focused context. &lt;/p&gt;

&lt;p&gt;The biggest takeaway from this experiment is that tooling isn't a massive, stateful "IDE Agent" that watches your every keystroke. Instead, there is value in composing stateless, focused expert agents that live intrinsically inside the purpose of the code. &lt;/p&gt;

</description>
      <category>agents</category>
      <category>serverless</category>
      <category>architecture</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
