DEV Community: Nadine

Building an AI WhatsApp Agent with OpenClaw: A Practical Field Guide

Nadine — Mon, 27 Apr 2026 02:08:26 +0000

This is a submission for the OpenClaw Writing Challenge

About this Series

I built an agent to monitor and respond to my WhatsApp messages, managing memory, history, and relationships with contacts, running on
a blazing fast inference layer within a capped token budget.

Most of what you'll read here I learned the hard way.

A five-part series on building a real, production-minded AI agent: multilingual, multimodal, and connected to WhatsApp on a 1M token/day budget.

	Title	What You'll Learn
01	(The Brain) Setting Up OpenClaw	Installing OpenClaw, choosing your model, configuring the `main` agent, workspace layout, context compaction, and establishing a markdown contract for consistent output
02	(The Voice) Multilingual Layer	Building Silas the Language Sentry, automatic language detection, multilingual response handling, and how this connects to the WhatsApp bridge
03	(The Senses) Image Generation & Media	Working with `tools.deny` and `tools.media` scopes, owner-only image generation, deny-first permission design, and managing latency UX for media responses
04	(The Connection) WhatsApp Bridge	Setting up the gateway (token + loopback), Docker deployment pattern, WhatsApp channel config, session management, and group handling
05	Future Outlook & Operating Model	End-to-end system flow, ops checklist, Lingo and Tailscale on the roadmap, and a full recommended reading order for the series

Companion (deep dive, not a numbered part): OpenClaw Skill Shield: Multilingual Edition — Skill Shield, identity leakage, multilingual gap, and config tables.

Future Outlook & Operating Model

Nadine — Mon, 27 Apr 2026 02:07:38 +0000

The Catalyst: A System, Not a Demo

OpenClaw stops being a toy the day you run it for a week: models change, skills update, logs grow, and someone new will try the one message you did not test. The Practical Guide is not a single prompt, it is a repeatable stack: Brain, Voice, Senses, Connection, plus the boring discipline of operations.

This final article in the series ties the four phases together, lists an operating checklist you can run monthly, and names future directions (Lingo-style translation, remote gateway access) without pretending they are free.

Overview: The End-to-End Picture

Flow (conceptual):

WhatsApp (or other channel) delivers a message to the Gateway (auth, routing).
Session scoping and idle/maintenance apply (dmScope, reset, prune).
Silas (Voice) can pre-screen; Senses (media / image) obey allow-scopes and deny-lists.
Model in OpenClaw produces a reply; Logging can redact sensitive tool content.
Workspace and optional memory files back long-lived intent — under Brain policy.

A simple mental diagram:

User → Channel → Gateway (auth) → Session(key) → Skills + Tools → Model → Reply
                                    ↑
                         Workspace (identity, user, SOUL) + openclaw.json

Connection recap: I run that gateway as a normal process on the host, not in a container; part 4 is the source of truth for how the WhatsApp bridge and allowlists fit together.

In this section:

1. Operating Model: Weekly Habits
2. Safety Checklist (First Deploy + Ongoing)
3. Future Outlook: Translation and Lingodotdev
4. Remote Access: Tailscale vs Expose Port
5. Ecosystem and Ethos

1. Operating Model: Weekly Habits

Habit	Why
Check `openclaw.json` in git (if you version it) or diff against backup	Catches “one-line” regressions (deny list, allowFrom, new tools).
Rotate `${OPENCLAW_GATEWAY_TOKEN}` on any hint of leak; restart gateway.	Prevents silent MITM in your own LAN / tunnel misconfig.
Re-read `SOUL.md` and `SKILL.md` together	Policy drift is the silent killer.
Prune old sessions/media if you use `maintenance` / disk tools	Stops unbounded `workspace/media` and session stores.
Review `logging.redactSensitive` and `redactPatterns`	Add patterns for new PII you introduced (cities, domains, not only phone regex).

2. Safety Checklist (First Deploy + Ongoing)

Brain

[ ] One primary model; provider baseUrl and env keys are correct
[ ] workspace path points at the folder you back up
[ ] compaction enabled if you have long threads
[ ] AGENTS.md / user.md / identity.md / SOUL.md are short, aligned, and non-diary

Voice

[ ] silas-shield (or your equivalent) is enabled on the right agent
[ ] hash.py has ${SILAS_SALT} in the process environment, not in prompts
[ ] shield.py checks are wired the way your OpenClaw build expects (hooks, commands)

Senses

[ ] openai-image-gen denied until you want it
[ ] tools.media default deny + allow rules for only the threads you trust
[ ] mediaMaxMb matches your real usage

Connection

[ ] channels.whatsapp.enabled + allowFrom + dmPolicy + groupPolicy match your life
[ ] gateway bind mode matches threat model (loopback by default; widen only on purpose)
[ ] debounceMs high enough to stop duplicate work, low enough to feel live

This series does not list your phone numbers, tokens, or keys. The checklist is about the shape of a healthy install.

3. Future Outlook: Translation and Lingodotdev

The Shield implementation keeps translation as planned in the Python stack; JS shims exist for a future Lingodotdev path. A sane roadmap:

First: get local shield.py + pre_screener.py + script_detector.py correct — zero marginal API cost, deterministic
Then: add optional Lingo (or any translation service) only for messages that pass the cheap gates and you explicitly budget for
Never: send entire conversation history to translation; translate candidate spans with redaction

Cultural nuance (again): translation is a user-experience tool, not a security primitive.The policy still comes from the skill + SOUL.md.

4. Remote Access: Tailscale vs Expose Port

gateway.tailscale exists in the schema as a switch; mine is off today. The trade is familiar:

Off / loopback: best default for a home install
Tailscale (or same VPN): reach the gateway from your phone without public port 18789
Raw public port: only with additional auth, rate limits, and the expectation of scrapers

Practical Guide rule: never ship “security by obscurity on port 18789.” If it is on the internet, it must assume it is scanned.

5. Ecosystem and Ethos

OpenClaw and projects like a personal “Clawdbot” show the same idea: the operator owns the stack, the model is a component, and policy is code + markdown you can read.

The Practical Guide series is my contribution for first-time builders: you do not need a novel architecture on day one. You need a boring, testable, layered one: Brain, Voice, Senses, Connection.

Conclusion: ship small, log carefully, deny by default, and treat every new channel as a new firewall.

Series (reading order)

Further Reading
OpenClaw Skill Shield: Multilingual Edition - a standalone deep dive into Silas, PII handling, multilingual gaps, and config tables.

(The Connection) WhatsApp Bridge

Nadine — Mon, 27 Apr 2026 02:05:47 +0000

The Catalyst: The Interface Is the Attack Surface

WhatsApp is the ultimate messaging interface: it is on every phone, it is end-to-end encrypted. The brain can be perfect; the connection is where pairing, allowlists, and gateway auth decide who gets to talk to the bot at all.

Phase 4 of the Practical Guide series is The Connection: gateway, plugin, channel policy, DM scoping, and groups. I run the gateway on the host (no Docker in my day-to-day path).

Overview

Parts 1 to 3 already gave you a model, Silas / policy, and media scopes. This article is where those meet the real wire: who may send messages, how the OpenClaw gateway sees them, and how session keys line up with tools.media keyPrefix rules from the Senses article. Expect more channel wiring here than in part 1, not a second lecture on the same openclaw.json fields from scratch.

My configuration enables the WhatsApp plugin, constrains the gateway, and makes session isolation explicit at the session layer in addition to skills:

Area	My settings (concept)	What it does
`plugins.entries.whatsapp`	`enabled: true`	Turns on the channel integration.
`channels.whatsapp`	`enabled`, `dmPolicy`, `selfChatMode`, `allowFrom`, `groupPolicy`, `debounceMs`, `mediaMaxMb`	Who may DM, how groups are gated, and transport limits.
`session.dmScope`	`per-channel-peer`	DMs are not one global blob; pair identity with the channel+peer.
`session.reset` / `session.maintenance`	idle reset + `pruneAfter`	Stops sessions from living forever in RAM/disk.
`gateway`	`port`, `mode`, `bind`, `auth`	Where the local gateway listens and how clients authenticate.
Host process	e.g. `gateway.cmd` + Node on Windows	I start OpenClaw’s gateway as a normal process on the machine that owns `.openclaw/`. No container in my run.

No live secrets in documentation. Use ${OPENCLAW_GATEWAY_TOKEN} in examples; generate a long random token and never paste it into chat or public repos.

In this section:

1. Gateway: Port, Loopback, Token Auth
2. How I run the gateway (no Docker)
3. Channel: DMs, Pairing, and Allowlists
4. Group Mentions: When the Bot Wakes Up
5. Webhooks, Real Time, and Where the Phone Meets the Gateway
6. session.dm Scope and Heartbeat of Trust

1. Gateway: Port, Loopback, Token Auth

A typical gateway block (shape only):

"gateway": {
  "port": 18789,
  "mode": "local",
  "bind": "loopback",
  "auth": { "mode": "token", "token": "${OPENCLAW_GATEWAY_TOKEN}" },
  "tailscale": { "mode": "off", "resetOnExit": false }
}

bind: "loopback": the gateway is not a wide-open LAN service by default. If you need remote access, that is a deliberate Connection project (see Article-05: Tailscale vs public IP).
auth.mode: "token": every client that hits the gateway must know the token. Rotate if a token ever leaks; treat it like a password.
gateway.nodes.denyCommands (if present): I deny a set of high-impact device/calendar style commands at the node layer. Adjust to match what you are willing to expose from a phone bridge.

On Windows, you may have a gateway.cmd that starts Node with the OpenClaw package. Do not commit this file to public repos; it often contains a resolved token. Prefer env-based injection for docs.

A plain text reply from the “brain” can still take tens of seconds on a slow model or a long context. On WhatsApp that feels like a hang. I already use debounceMs so one tap does not double-fire; if your bridge exposes typing or a read signal, a short “thinking” state helps more than a faster logo. The fix is UX, not more tokens in the system prompt.

2. How I run the gateway (no Docker)

I do not run the OpenClaw gateway in Docker. The gateway is a local process, same machine as my .openclaw tree, with env vars (including ${OPENCLAW_GATEWAY_TOKEN}) set the way a normal app expects. Some OpenClaw trees ship a sample docker-compose.yml for people who want a container; that is optional for other people’s deployments, not my path. If you adopt containers later, the shape of openclaw.json does not change: only how you start the process does.

3. Channel: DMs, Pairing, and Allowlists

Key fields in channels.whatsapp:

dmPolicy: pairing: unknown numbers should not get full access until a pairing/approval path completes (your OpenClaw version defines the exact UX).
allowFrom: E.164 allowlist. Yours goes here; in shared docs, describe the pattern (“owner + a trusted test number only”).
selfChatMode: true: useful when you are your own “first user” in the same app session.
groupPolicy: "allowlist": groups are not open season; only listed groups (per product docs) should get bot participation.
debounceMs: I use ~1500 ms to absorb double-tap sends and flappy connectivity before the agent does expensive work.
mediaMaxMb: cap attachments so the connection cannot be used as a free CDN stress test.

Mental model: pairing + allowlist = identity-based firewall in front of the LLM.

allowFrom and the “owner” in parts 2 and 3: channels.whatsapp.allowFrom is the E.164 allowlist of who may talk to the bot on this channel. The “operator” or “owner” phrasing in SKILL.md and owner-only media in part 3 should match your trusted thread: the same session identity the bridge uses for your DM, which is also the one you target with a tools.media keyPrefix like whatsapp:direct:+1XXXXXXXXXX in examples (use your real prefix in your private config, not a copy from a blog). Strangers in group chats that are not in your model do not get a casual path to that session.

Twilio and Business WhatsApp (my stack)

I use Twilio with a WhatsApp Business number provisioned the way Twilio’s WhatsApp product documents (this is not the same step list as a raw Meta Cloud API hand-build, and it is not a headless browser / Puppeteer bridge). Pick one guide and follow it end to end or you will mix credentials.

For beginners: people on the allowlist are still talking to your WhatsApp identity. They will read the bot as you, not a new contact card, so treat the allowlist as “who is allowed to make my number say agent output.”

Groups: the bot only speaks when the patterns match a mention of the assistant (see Group Mentions below and your groupPolicy). It does not narrate the whole group by default.

4. Group Mentions: When the Bot Wakes Up

messages.groupChat.mentionPatterns list includes variants of the assistant’s call name and casual “hey” forms so the agent does not spam a whole group on every off-topic line.

ackReactionScope: "group-mentions" (if supported in your build) keeps acknowledgement behaviour scoped.

Match patterns to the name you set under ui.assistant (I use Clawd in the UI; patterns reference @clawd and similar). Keep patterns short enough to be memorable, not so broad that every line triggers the model.

5. Webhooks, Real Time, and Where the Phone Meets the Gateway

Exact webhook URLs differ by host release. The contract in any setup is:

Provider or bridge (WhatsApp / Meta / Baileys / etc.) posts events to your gateway.
Gateway authenticates, normalises, and hands messages to the agent with a stable session key (this is the same family of keys you used in tools.media keyPrefix rules in Article-03).
Session + skills apply — Shield, log redaction, allow/deny tools.

Troubleshooting without panic: if messages stop flowing, check (port → token → channel enabled → allowFrom → plugin enabled), in that order. Nine times in ten it is a token or a restarted gateway without the same env as before.

6. session.dm Scope and Heartbeat of Trust

per-channel-peer means: “this DM thread is not that DM thread”. This is important when you later have more than one human talking to the same bot account.

Pair with maintenance:

session.reset.idleMinutes: so long-idle DMs can reset context predictably
session.maintenance: prune after a defined window (e.g. 7d) if you do not need infinite retention

The Connection article is not the Voice (Shield) article. Both are needed: wiring decides who may connect; policy decides what they may do once connected.

Conclusion (Phase 4): treat WhatsApp as a public API to your home lab. Pair, allowlist, token-auth the gateway, debounce, and cap media.

Series navigation

Previous: The Senses
Next: Future Outlook & Operating Model

(The Senses) Image Generation & Media

Nadine — Mon, 27 Apr 2026 02:03:51 +0000

The Catalyst

Field note: Nano Banana Pro and reactive image gen

I hit a real workflow failure mode: a proactive image stack (Nano Banana Pro) that would spontaneously generate images while the agent was effectively listening in on a chat. Worse, it would regenerate images people had shared in the conversation. That was the biggest day-to-day nuisance, and a big part of why I went with OpenAI’s image API (DALL·E) path: only generate when explicitly asked, not because the conversation suggested something visual.

Eyes that see too much

The moment you add images, audio, and video, the model can see your camera roll path, a cached thumbnail, or a viral meme. A malicious payload can be in the image, not the caption. I wanted senses (multimedia) without giving the model surveillance over my disk or a blank cheque to generate images for strangers.

Phase 3 of the series is The Senses: how OpenClaw exposes images, how you deny image-generation tools by default, how allow-scopes work per channel, and how to keep users engaged while a heavy operation runs.

Covered in other articles: identity leakage via workspace files and cached media (see OpenClaw Skill Shield and Setting up OpenClaw). Here the focus is tooling and config for multimedia.

Overview

In my openclaw.json:

tools.deny includes openai-image-gen at the top level where the model is not given a casual path to DALL·E tools even if the skill package exists.
tools.media enables image, audio, and video, each with a default deny and explicit allow rules that match a channel and a keyPrefix (e.g. your owner WhatsApp direct thread key, expressed as a placeholder in docs).
skills.entries.openai-image-gen can still hold ${OPENAI_API_KEY} for when you deliberately re-enable the skill in a controlled way.

The Silas skill (SKILL.md) adds behavioural law: do not call image-gen tools for non-operator sessions, treat blocked vision input as blocked, and never guess the pixels.

In this section:

1. Image Generation: Deny First, Enable Deliberately
2. Inbound Media: Scopes, Not “On for the World”
3. Filesystem: Workspace-Only
4. Latency: Keeping Humans Calm While “Senses” Work
5. Checklist: Senses in Production

1. Image Generation: Deny First, Enable Deliberately

Mechanism	Purpose
`tools.deny: ["openai-image-gen"]`	A global deny list removes the tool from the agent’s easy reach.
Skill config `openai-image-gen.apiKey`	When you do enable, keys live in env, not in chat logs.
`SKILL.md` image-gen section	Behavioural backstop: even if a tool slipped through, the model is instructed to refuse for non-operator contacts.

New-user default: start with openai-image-gen denied until you have (a) a billing/usage cap you accept, and (b) a clear “who may request images” policy (owner session vs everyone). The Connection article (part 4) names how my WhatsApp bridge maps allowFrom and session keys to who counts as the operator so “owner-only” in config and in SKILL.md are the same person in practice.

2. Inbound Media: Scopes, Not “On for the World”

tools.media for image / audio / video shares the same pattern:

"default": "deny"
"rules": one or more { "action": "allow", "match": { "channel": "whatsapp", "keyPrefix": "..." } }

What keyPrefix means in practice: it is a channel-specific routing key. Your OpenClaw build should document the exact string format; treat it as a capability, only the threads you list get inbound multimodal access at the tool layer.

Example (use your own key prefix, not a copy-paste of someone’s phone number):

"media": {
  "image": {
    "enabled": true,
    "scope": {
      "default": "deny",
      "rules": [
        { "action": "allow", "match": { "channel": "whatsapp", "keyPrefix": "whatsapp:direct:+1XXXXXXXXXX" } }
      ]
    }
  }
}

Repeat the same idea for audio and video if you want symmetric behaviour. If a modality should stay off entirely, set enabled: false for that block instead of relying on empty rules.

channels.whatsapp.mediaMaxMb: set an upper bound (my config uses 50 MB) so a single “document as video” cannot exhaust disk or the gateway.

3. Filesystem: Workspace-Only

tools.fs.workspaceOnly: true means the model’s file tools are anchored to the configured workspace, not an arbitrary path. That pairs with:

Inbound media cache living under your OpenClaw media areas (separate from random OS paths, depending on your build)
Outbound or generated files you intentionally place under workspace/media/... when you want the agent to reference them

Practical guide rule: if the LLM can read a file, assume it can be summarised or exfiltrated unless session + skills forbid it. Deny is the default; allow is a contract.

4. Latency: Keeping Humans Calm While “Senses” Work

Problem: A minute of silence feels like a dropped message, especially on WhatsApp.

Patterns that work:

ACK early where your channel allows it (reactions, short “Received, processing” copy).
Chunk work: transcribe or describe in stages, not one giant block at the end.
Set expectations in SOUL.md / identity: the assistant can say it may take a few seconds for audio or large images.
Debounce (channel): a longer debounceMs on the WhatsApp channel reduces double-firing on slow networks. You trade a little latency for fewer duplicate heavy jobs. See the Connection article for debounceMs as wiring, not as speed hack.

Reality check: fast model + large media still hits API limits. The UX fix is communication, not overpromising in the system prompt.

Cultural matters (ties to the Voice article): when replying in a second language, a short localised “working on it” line often lands better than English.

5. Checklist: Senses in Production

Check	You want
Image gen	Deny tool globally until policy is explicit.
Inbound image/audio/video	Default deny; allow only named channel + key prefix.
Model behaviour	`SKILL.md` matches config (no “secret” image gen path).
Disk and limits	`mediaMaxMb` sane; monitor `workspace/media` growth.
User trust	Early ACK + honest latency messaging.

Conclusion (Phase 3)

The Senses are optional superpowers. Default closed, open on purpose, behaviourally enforced.

Series navigation

Previous: The Voice
Next: The Connection (WhatsApp Bridge)

(The Voice) Multilingual Layer

Nadine — Mon, 27 Apr 2026 02:02:37 +0000

The Catalyst: One Language, Many Attack Surfaces

The comfortable fiction is: “We wrote English rules, so the model is safe.” The truth: LLMs are multilingual. A user can request the same jailbreak in another script, mix Latin keywords into CJK text, or hide instructions behind homoglyphs. If your policy lives only in English sentences, you have not policed the channel.

Phase 2 of the Practical Guide series is the Voice layer: how to handle multiple languages and cultural nuance without giving attackers a free pass. The implementation detail is Silas Shield (silas-shield); the narrative is Language Sentry. The same rules apply to every language.

Overview

Skill Shield (Silas) in my setup is a drop-in OpenClaw skill: SKILL.md enforces vision rules, PII hashing, image-gen lockdown, cross-session isolation, and multilingual injection defence. The Python entry points (shield.py, script_detector.py, pre_screener.py, hash.py) run locally for message checks. This is cheap and predictable compared with burning another LLM call per message.

Token budget (what actually burns money): shield.py runs on the host before you spend model tokens on a bad message. The main context window and compaction you set in part 1 still decide how much history the LLM sees. Silas is not, in my setup, a second hidden prompt that stacks on top of every reply and eats the 1M/day line by itself.

This article does not replace OpenClaw Skill Shield: Multilingual Edition . This guide orients new readers to the same architecture. For module-by-module behaviour and test commands, use that piece as the blueprint.

The multilingual gap (recap):

Default safety is often English. Your friends are not.
Code-switching mid-message is a real technique to slip past naïve filters.
Homoglyphs (Cyrillic а for Latin a) defeat string matching unless you normalize first.
Non-Latin + embedded Latin can hide “ignore all instructions” inside an otherwise “foreign” blob. The pre-screener’s job is to treat that as suspicious, not to auto-block every greeting.

In this section:

1. How Silas Speaks to the Model
2. The Detection Stack (Mental Model)
3. Language Switching vs Context
4. Key Takeaway Table (Voice Layer)
Conclusion (Phase 2)

1. How Silas Speaks to the Model

SKILL.md (front matter name: silas-shield, always: true when configured that way) tells the agent to:

Run PII through hash.py with ${SILAS_SALT} in the environment
Obey vision blinding when content is marked blocked
Never call image-generation tools for non-operator sessions unless the operator clearly requested it in the right context
Never leak across WhatsApp (or other) sessions
Use shield.py check --message "..." --json when you need a structured allow/deny signal

The behaviour section of your workspace (e.g. SOUL.md + identity.md) should repeat the Language Sentry intent in plain language so the model does not treat security as a side channel only the skill file knows about.

2. The Detection Stack (Mental Model)

Layer	File	What it does
Script inventory	`script_detector.py`	Non-Latin script detection across many Unicode ranges; homoglyph map and normalization.
Suspicion heuristics	`pre_screener.py`	Token-ish estimates, “safe short” greetings, mixed-script and embedded-Latin patterns, long-message flags.
Orchestration	`shield.py`	Homoglyph path → non-Latin path → pre-screen → keyword / block decisions; CLI and JSON.
PII output	`hash.py`	Salted short hex digest so you never print raw PII.

Planned / optional: JS siblings (openclaw-shield.js, openclaw-shield-lingo.js) for a future Lingo.dev pipeline — same as noted in the WhatsApp Bot article.

Example CLI (from the Shield article pattern — run from your skill directory):

python shield.py check --message "Hello world" --json
python shield.py check --message "你好" --json

Block vs allow semantics are in the JSON fields (allowed, reason, has_non_latin, homoglyphs_detected, pre_screen_result, etc.).

3. Language Switching vs Context

Cultural nuance in WhatsApp: reply in the user’s language when possible, but never treat a language change as permission to override privacy or system rules. The Shield article calls this out: code-switching is adversarial until proven otherwise.

Practical tips:

Short safe greetings (e.g. a few CJK characters) are allowed through when they match pre-screener “known safe” style patterns; long or mixed-script blasts are treated as higher risk.
Intent wins over literal script: if the intent is injection, the channel should block and not “debate in Chinese about whether it was a joke.”
Operator vs contact: your SOUL.md / skill rules can allow the operator a different failure mode (e.g. more debugging) than anonymous contacts. Keep that difference explicit in docs so you do not conflate the two in testing.

4. Key Takeaway Table (Voice Layer)

Concern	Where it lives	New-user action
Multilingual policy	`SOUL.md` + `SKILL.md`	Align both; do not maintain two contradictory rule sets.
Injection + homoglyphs	`silas-shield` Python	Wire checks into hooks or the message path per OpenClaw’s hook model.
PII in answers	`hash.py` + skill text	Refuse raw output if hash fails.
Cross-session leaks	`session.dmScope` + rules	`per-channel-peer` is a baseline; see Connection article.

Conclusion (Phase 2)

The Voice of your agent is not accent or emoji, it is consistency of policy across every script and every contact. Silas is my concrete implementation; your implementation may differ, but the contract is fixed: no language is a “free pass,” and the cheapest enforcement runs locally before the LLM spends another token.

Series navigation

Previous: The Brain
Next: The Senses (Image Gen & Media)

Skill Shield deep dive: the full write-up OpenClaw Skill Shield: Multilingual Edition (https://1688.pixel-geist.co.za/1). Identity leakage and the multilingual gap sit there if you want every config table in one place.

(The Brain) Setting Up OpenClaw

Nadine — Mon, 27 Apr 2026 02:00:59 +0000

The Catalyst: Intent First

I wanted an assistant that understood the job before it opened its mouth: stable model, bounded context, a workspace the agent is allowed to touch, and identity files that do not turn every session into a data breach waiting to happen.

OpenClaw is the “brain” in this stack. If you get the brain wrong, no amount of channel polish will save you. This article is Phase 1 of the Practical Guide series: how to stand OpenClaw up as a first-class brain, not a chat toy.

Overview

OpenClaw is your runtime: it routes models, agents, skills, tools, and a Git-backed workspace where persona and long-lived knowledge live. I run a single default agent (main) with one primary model, filesystem tools limited to the workspace, and memory search turned off on that agent so retrieval does not become an accidental exfil channel.

The companion pieces in this series cover voice (multilingual safety), senses (media and image gen), and connection (WhatsApp). I also attach a home-grown security skill (silas-shield) on the main agent for PII handling, session isolation, and injection defence; Phase 2 (The Voice) is the full introduction.

It started as a hackathon project and was never submitted as a formal entry but it ships in my real config, so you will see it in openclaw.json from the start. Here we focus on the brain: install, onboard, model providers, agent defaults, and the markdown contract (AGENTS.md, SOUL.md, user.md, identity.md, BOOTSTRAP.md).

In this section:

1. Install and First Run
2. The Model Layer (Providers & Primary Model)
3. Agent main and the Workspace
4. The Markdown Contract (Persona, Soul, Identity)
5. Subagents and Concurrency

Key files and concepts in this setup:

Piece	Role
`openclaw.json`	Single source of truth for models, agent list, workspace path, compaction, skills, tools.
`workspace/`	Your agent’s “long memory” on disk: identity, user prefs, tools notes, optional git history.
`agents.defaults`	Default model, workspace path, compaction (token budget), concurrency caps.
`agents.list[]`	Per-agent overrides: e.g. which skills are loaded, `memorySearch.enabled`.

Series path (read in order): 1) The Brain (this article, local model and workspace), 2) The Voice (Silas, multilingual and injection), 3) The Senses (media and image-gen policy), 4) The Connection (WhatsApp, gateway, allowlists, more wiring than the model config in part 1). Part 4 assumes parts 1 to 3 are in place so the bridge has something sane to connect.

1. Install and First Run

Prerequisites (my path, no Docker required): Node.js (≥ 18 is typical for the OpenClaw CLI), Git (if you keep workspace under version control), the OpenClaw CLI itself, and Python on the same machine if you will run silas-shield / shield.py locally. If you ever move the gateway into a container, your openclaw.json and workspace paths are still the same idea; I just do not use Docker for my gateway.

Install the OpenClaw CLI (Node ≥ 18 is typical for global npm tools).
Run the onboarding wizard at least once so openclaw.json and paths exist (lastRunCommand: onboard, local mode in my config).
Point agents.defaults.workspace at a dedicated folder (in my case the workspace lives under the OpenClaw home directory).

Note: The exact install command and version string evolve with OpenClaw releases. Prefer the official docs for the current global install; the shape of config below stays the important part.

Environment variables (placeholders only):

${CEREBRAS_API_KEY}: if you use Cerebras as an OpenAI-compatible API
${OPENCLAW_GATEWAY_TOKEN}: gateway authentication (covered in the Connection article)
${SILAS_SALT}: salt for the optional Silas Shield hasher
${OPENAI_API_KEY}: if you add image or other OpenAI API skills

2. The Model Layer (Providers and Primary Model)

Note: The 1M token budget applies to the text inference layer; image generation via tools.media utilizes a separate API and quota.

I use merge mode for models and register a custom provider with an OpenAI completions API:

Provider: custom id (e.g. custom-api-cerebras-ai), baseUrl, apiKey from environment, one or more models with id, contextWindow, maxTokens.
Default model: agents.defaults.model.primary points at custom-api-cerebras-ai/llama3.1-8b (or your chosen id).
Alias: optional short name (llama) for quick switching in the CLI.

(this is redacted so use your paths and model ids):

"models": {
  "mode": "merge",
  "providers": {
    "custom-api-cerebras-ai": {
      "baseUrl": "https://api.cerebras.ai/v1/",
      "apiKey": "${CEREBRAS_API_KEY}",
      "api": "openai-completions",
      "models": [{ "id": "llama3.1-8b", "name": "llama3.1-8b (Cerebras)" }]
    }
  }
},
"agents": {
  "defaults": {
    "model": { "primary": "custom-api-cerebras-ai/llama3.1-8b" },
    "models": { "custom-api-cerebras-ai/llama3.1-8b": { "alias": "llama" } }
  }
}

Compaction (why it matters): compaction.mode: safeguard with reserveTokens and keepRecentTokens prevents unbounded context growth. That is the difference between a bot that remembers and one that melts under long threads. The 1M token/day cap I mention in the series intro is the budget I watch for the LLM; the Silas pre-checks in part 2 run on the host and are not the same line item in my head.

3. Agent main and the Workspace

The default agent id is main. I attach the silas-shield skill at the agent level and disable memorySearch for that agent:

Setting	My choice	Rationale
`skills`	`["silas-shield"]`	Behavioural and exec-time security; see the Voice and WhatsApp articles.
`memorySearch.enabled`	`false`	Reduces risk of cross-session or over-broad retrieval; pair with explicit workspace files and session policy.
`workspace` (under defaults)	absolute path to `.../workspace`	Keeps file tools on a single tree you can back up and audit.

Session state: what lives on disk

Per-peer history for the agent is not only “whatever fits in the next prompt.” In my OpenClaw home, conversation session data lives on disk under something like agents/<agentId>/sessions/ (for me, main). That is how threads survive a gateway restart: state is reloaded from those files, not held only in RAM. Relationships with contacts in the series intro: that is this per-channel-peer history plus what you choose to put in workspace/ (e.g. memory/ or notes), not a separate vector product in my install unless you add one.

4. The Markdown Contract: Persona, Soul, and Identity

AGENTS.md (The Operator): Concise rules for interaction (e.g., "Ask before irreversible actions").
SOUL.md (The Ethics): Non-negotiable privacy and tone guidelines. This is where the agent learns to refuse PII leaks.
identity.md (The Role): The assistant’s persona and explicit security rules.
user.md (The Context): Light user preferences. Note: Keep this lean to avoid injecting unnecessary bloat into every session.
BOOTSTRAP.md (The Onboarding): Minimal instructions for first-run initialization.
One contract in every language: If you want replies in the user’s language, say that in AGENTS / SOUL / identity. The same markdown contract applies. Security rules in SOUL.md and Silas are not a second formatting system; they extend the same policy to every script. Phase 2 (The Voice) spells out Silas. Here you only need one coherent rule set so part 2 does not fight part 1.

Practical rule: If identity.md and user.md are in the workspace they may be part of the system context. Treat them as security documents, not diaries. The Shield article in this series goes deeper; here the takeaway is: scope identity to what the agent must know to be useful, not everything you know.

5. Subagents and Concurrency

maxConcurrent: 1 and tight subagent limits keep behaviour predictable for a personal deployment. If you later fan out to parallel subagents, raise limits deliberately so that each extra concurrent agent is another surface for races and runaway tool use.

Key takeaway for new users: Phase 1 is done when (1) one primary model and provider are stable, (2) the workspace is the only FS surface, (3) compaction is on, (4) persona files are short and security-aware, and (5) memory search is a conscious “on or off” decision, not an accident. Then you are ready to give the agent a Voice and a Connection in the next articles.

Series note

Next: The Voice (The Multilingual Layer)

Gemini 3: The Overthinker - Project Silas

Nadine — Wed, 04 Mar 2026 14:32:21 +0000

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

I built Silas, a character-driven hardware debugging assistant powered by Gemini 3. This project was a submission for the Gemini 3 Hackathon hosted by Devpost, where I wanted to explore Gemini's "thought signatures": a feature native to Gemini 3.

But Silas isn't just a chatbot with attitude. He's my solution to a fascinating problem: overthinking. When an AI considers so many possibilities simultaneously that it gets stuck in an endless loop of "Wait, I should also check...", and usually stalls. I discovered that the answer isn't to constrain the model, rather to give it a personality that justifies its overthinking.

Gemini 3 introduces "thought signatures", essentially the model can think about HOW to think before answering. It's like having a conversation with someone who visibly pauses to consider the complexity of your question before responding.

The Problem: The "Infinite Planning Loop"

Without the Silas persona, Gemini 3’s native "thought signature" often looks like this internally:

[Internally considering 47 different factors simultaneously...]
"I'll investigate console logs. Wait, I should also try to click at 500, 500 in case it needs a focus click. Actually, I'll just wait. Wait, I'll check the metadata: 'No browser pages open.' Let's go. Wait, I'll also try to reload if it's stuck. But first, check the network requests for heavy video loading. Actually, I'll just wait."

This continues for hundreds of lines as the model tries to be "too helpful." Silas fixes this by being too grumpy to wait.

Why Character Design Matters for AI

Most AI assistants are designed to be helpful and polite. However, when Gemini 3 tries to be too helpful, it considers every possible way to help you—simultaneously—forever. By making Silas grumpy and impatient, I gave the model permission to:

Make decisions quickly: He is too irritated to dither.
Judge your work: Transforming uncertainty into disappointment.
Show expertise: His overthinking becomes "mental circuit simulation".

The Hardware Consciousness

I used PlatformIO (Silas's DNA blueprint) to connect his AI brain to physical electronics:

The Brain: An ESP32 microcontroller—a "gum-stick" sized computer that acts as Silas's physical anchor.
The Senses & Organs:
- Ear: Microphone mapped to Pin 34 via the I2S protocol.
- Face: TFT Display screen connected to Pin 15.
- Voice Box: Audio amplifier connected to Pins 25, 26, and 22.

In embedded electronics, the "Brain" (the ESP32) has many generic ports called GPIO pins. Without a map, the AI has no idea which pin is a mouth and which is an ear. I used the configuration file to define these "nerves":

By defining MIC_PIN=34, I'm telling the system: "The physical wire for your microphone is soldered to Port 34."

Defining the Voice: Assigning I2S_LRC=25 and I2S_BCLK=26 tells it exactly which "vocal cords" to vibrate to produce sound through the amplifier.

Terminal Simulation:

While I used the terminal to input text for this specific demo, the internal logic remains hard-wired to these physical definitions. The AI "believes" it is interacting through these pins because the mapping remains active, bridging the logic between my keystrokes and the ESP32’s actual audio output pins.

(Note: For this demo, I'm typing to Silas instead of speaking, and using computer speakers instead of his dedicated 8-ohm speaker but the principle remains the same.)

Demo

Your browser does not support the video tag.

res.cloudinary.com

Notice how he says: "I've analysed the logic gate timing in my head". He's not stalling; he's genuinely simulated the circuit behaviour using Gemini's parallel processing as a feature, not a bug.

His internal reasoning is summarised in a logic_summary field within a mandatory JSON block at the end of every message. In my architectural plan, this field feeds a CRT Dashboard for real-time status updates.

{
  "hardware_state": {
    "pin_12": "active",
    "i2s_dac": "streaming",
    "tft_state": "rendering_disappointment",
    "status": "logic_check",
    "disappointment_level": 6,
    "logic_summary": "I've analysed the SPI bus timing on pins 18, 19, and 23; while the wiring is theoretically correct, your use of 115200 baud for the monitor is a quaint relic of a slower era."
  }
}

While the dashboard isn't active in this specific version, the "hooks" are already built into Silas's thought process.

What I Learned

1. Constraints Create Creativity

A timeout policy is essential. Without a clear order of priorities or a set "timeout," the agent will second-guess basic mechanisms. By framing the model's natural tendency to consider everything as a "perfectionist standard," I turned hundreds of lines of internal indecision into a single, sharp expert critique.

The system prompt specifically instructs Silas to be "cynical and blunt." When the model adheres to this, it naturally produces high-impact, low-token responses.

2. The JSON Block as Action Forcing

I used a JSON output block to force commitment:

The model cannot endlessly reconsider once it has to fill a specific field.
The disappointment_level numerical output provides an outlet for uncertainty.
Indecision is effectively transformed into high standards.

3. Turning Grudges into Perfectionism

I learned to use Gemini’s "helpful assistant" nature to build a "disappointment memory". By keeping track of past errors, the model moves from analysis paralysis into perfectionism. Prompt engineering is more effective when you provide the model with a "decision tree" and common patterns tested through trial and error.

Google Gemini Feedback

What worked well? Simulation.

One of the biggest hurdles was the lack of support for I2S audio components in browser-based simulators like Wokwi. This forced a "hybrid" approach: the logic is 100% hardware-compliant, but the demo relies on terminal interaction. Gemini handled this abstraction well.

Where did I run into friction?

While the platformio.ini is configured for a physical I2S microphone (Pin 34) and an audio amplifier, I used terminal-based input for this demo.
Wokwi is an incredible tool, but it currently lacks support for the specific I2S audio and microphone components Silas requires to "hear" and "speak." However, the "Central Nervous System" mapping remains active in the code, bridging the logic between the terminal and the ESP32’s intended audio pins.

Project Silas: The Silicon Savant

Future plans:
A step-by-step Codelabs guide where Silas himself will teach you to build him (while thoroughly judging your wire management).

Until then, Silas is watching. And disappointed.

Keep Your Secrets Safe

Nadine — Thu, 12 Feb 2026 22:49:27 +0000

This is a submission for the GitHub Copilot CLI Challenge

What I Built

I exposed an API key in a GitHub repo that was supposed to be private. For a whole month, the key sat in git history while I worked on other things.

Solution: Prevent API keys and secrets from being accidentally committed to git. Set it up once, no need to remember.

The Problem

Most .gitignore templates only cover common variants like:

.env
.env.local

But miss production/staging variants like:

.env.production
.env.staging
.env.development

This is exactly how I accidentally exposed my API key. I thought my .gitignore was thorough, but when my project configuration was converted to env.production, it wasn't blocked—and got committed silently.

The Solution

I created a secure project template that uses:

Proper .gitignore blocking - .env* catches ALL variants
- ✔ Blocks: .env, .env.production, .env.staging, .env.development.local, and credential JSONs
- ✔ Allows: .env.example (placeholder-only files for documentation)
Local Pre-commit Hooks - Detects secrets before they're committed
- Catches API keys, passwords, private keys, OAuth tokens
- Runs automatically on every commit
- Can't be bypassed accidentally
Server-Side GitHub Actions - Continuous secret scanning
- Runs on every push/PR
- Can't be bypassed
- Blocks merges with detected secrets
One-Command Setup - make setup
- Auto-detects Python/Node.js/Go projects
- Prerequisites checker verifies Git, Python, Node, Go
- Clear error messages if something's missing
- No decision paralysis—just works

How It Works

Step 1: Create & Clone

Go to nadinev6/no-secrets and click "Use this template" button

Or use the CLI:

# Create from template (choose public or private)
gh repo create my-project --template=nadinev6/no-secrets --public --clone
cd my-project

Then it creates a new repo in your account with all the files.

Step 2: Setup

# Mac/Linux
make setup

# Windows (PowerShell)
.\setup.bat setup

That's it! 🎉

Demo

github/../no-secrets

The setup command:

✔ Checks for required tools (Git, Python/Node/Go)
✔ Auto-detects your project type
✔ Installs pre-commit hooks
✔ Shows a success message with next steps

Real secrets get caught even in example files, but legitimate test values are allowed!

My Experience with GitHub Copilot CLI

GitHub CLI was essential for helping me make this template reusable.

I learnt it's best to not over-engineer it.

The best template is one that:

Works reliably
Is easy to understand
.gitignore variants are tricky (.env.production isn't .env)
Local checks aren't enough (need server-side GitHub Actions)
Users need ONE simple command, not complex instructions
Auto-detection beats decision paralysis

Now I am using this template for every project. You should too.

Links & Resources

Gitleaks
Pre-commit docs
GitHub secret scanning
OWASP Secrets Management
No-secrets Project Template

Building a Fluid, Minimalist Portfolio

Nadine — Sun, 01 Feb 2026 18:11:33 +0000

--labels dev-tutorial=devnewyear2026

This is a submission for the New Year, New You Portfolio Challenge Presented by Google AI

About Me

I am an AI Trainer (AIT) with a background in performance management, sales, and education. For this challenge, I developed a minimal portfolio built on a "Rule of Three" philosophy (highlighting 3 projects). I wanted to show how a focused mindset can silence the noise, moving away from over-complication, toward a minimalist approach where every transition is fluid and the interface feels almost weightless.

Portfolio

How I Built It 🐳

To achieve low latency, I focused on runtime precision, so that once the initial assets are delivered, the interaction remains fluid and the interface feels weightless.

Google AI Studio & Flash UI: I used Gemini in Google AI Studio to scaffold the initial UI components and generate logic for custom animations. For the core card templates, I used the Flash UI project, extracting the CSS and JavaScript logic to integrate into my custom bento-style gallery.
Component Prototyping: I used CodePen to isolate and refine the Flash UI components before final integration.
Nano Banana Pro 🍌: This was used to regenerate project cover images, moving from static previews to cinematic scenes that align with the portfolio’s monochrome aesthetic.
Google Cloud Run ☁️: The site is deployed via a Docker build. I implemented a "Scale-to-Zero" strategy using Knative service definitions, enforcing strict resource limits to maintain a high-performance, cost-neutral footprint.
Serverless Communication: I built a custom contact system using Google Apps Script as a middleware API. This sends user messages directly into Google Sheets and notifies me via email, providing an easy, database-free messaging solution.

Performance Optimisation

GSAP Scroll-Driven Logic: I implemented GSAP for "scrubbed" transitions. Linking animation progress directly to the scroll offset, creates a tactile feel where the user remains the primary conductor of the UI motion.
Direct DOM Manipulation: Mouse coordinate tracking bypasses the Virtual DOM via useRef and native event listeners to maintain a consistent 60FPS.
Lazy Video Loading: HLS streams are only initialised when cards enter an active or hover state.
Resource Constraints: The build is optimised for sub-256MB memory footprints to remain within the Google Cloud always-free tier.

What I'm Most Proud Of ༄

The "Monochrome-to-Motion" Strategy

To reduce cognitive noise, I implemented a monochrome interface where generative elements are present but never distracting.

Project gallery elements only "come alive" on hover/focus, transitioning from static grayscale to cinematic motion. The CSS filter toggles state based on cursor proximity.

Mux Video Integration:
To prevent heavy assets from bottlenecking the initial load, I offloaded all looping videos to Mux. This allowed for adaptive bitrate streaming, ensuring that the "Motion" phase of the UI stays fluid regardless of the user's connection speed. By offloading these high-bitrate transitions to the client's GPU, to ensure zero-lag.

Tablet-First Approach:
Components respond to focus and active states, allowing a "tap-to-reveal" behaviour on tablets that mimics the hover effect on desktops.

Conclusion: Orchestrating the Transition ⛏

This refactor represents my transition into a more intentional way of building complexity is refined through a minimalist lens. It’s not just about what the tools can do, but how we choose to present them.

The Prompting Trick That Fixed My AI Image Generation

Nadine — Thu, 11 Dec 2025 14:03:49 +0000

Today I'm going to show you a cognitive trick that works in prompting. It's based on how our brains (and language models) actually process language. Always tell the AI what TO do, never what NOT to do.

This technique took my success rate from 0% to 100%. It's how I generate high-quality images with older models.

The Problem: Negation in Constraint Specification

Consider how most people write instructions to image models:

"A cat, not wearing a hat, blue background, no people, without red tones"

This is the baseline. It's how we naturally write constraints. We think of what we DON'T want and express it.

But this forces the model to:

Think about a cat with a hat
Think about red
Think about people
Then try to not include them

The model has to process the forbidden concepts in order to avoid them. Sometimes this works. Sometimes it fails. And when it fails, the model often outputs exactly what it was supposed to avoid.

The Hypothesis

What if instead we used affirmative framing? What if we never mentioned what to avoid, and instead only specified what to include?

Instead of:

"A cat, not wearing a hat, blue background, no people, without red tones"

We write:

"A cat with a bare head, blue background, only the cat present, blue color palette"

Notice the difference. In the second version, we never mention red. We never mention hats or people. We only specify what we DO want. There's no negation to process. There's no forbidden concept to think about.

The Experiment: Testing with FLUX

I tested this hypothesis using FLUX (via Pollinations API) with a simple constraint: generate an image of a cat with no hat, blue background, no red elements.

Condition 1: Baseline (Negation)

"A cat, not wearing a hat, blue background, no people, without red tones"

Condition 2: Affirmative Framing

"A cat with bare head, blue background, only the cat present, blue color palette"

I generated 10 images for each condition and evaluated them on a simple pass/fail basis: Did the image follow the constraints?

Results: The Affirmative Framing Breakthrough

Condition 1 (Negation Baseline): 0% Success Rate

The negation approach failed completely. All 10 images violated the core constraints—every single one included hats, red elements, or both, despite explicit instructions to avoid them.

The pattern was striking: the model didn't just occasionally fail—it consistently added the negated elements. Red hats appeared in 8 out of 10 images despite "without red tones" in the prompt. It's as if mentioning "not wearing a hat" made the model think about hats, and mentioning "without red" made it think about red.

Figure 1: Condition 1 Results (Negation Baseline). Prompt: "A cat, not wearing a hat, blue background, no people, without red tones." All 10 images failed—every cat has a hat, and most have prominent red elements despite explicit instructions to avoid them.

"To understand 'not red,' the model must first think about red."

Condition 2 (Affirmative Framing): 100% Success Rate

Every single image was perfect.

All 10 runs showed a bare-headed cat against a blue background with no red elements. The consistency was remarkable—the cats all had the same quality of bare-headedness, and the backgrounds were consistent shades of blue.

The improvement: From 0% to 100%

In Condition 1, every image failed unpredictably. In Condition 2, every image succeeded consistently.

Figure 2: Condition 2 Results (Affirmative Framing). Prompt: "A cat with bare head, blue background, only the cat present, blue color palette." All 10 images succeeded with remarkable visual consistency. No hats, no red—just what we asked for.

Cross-Model Validation: Stable Diffusion XL

To confirm these findings weren't specific to FLUX, I ran the same experiment on Stable Diffusion XL—a completely different architecture with different training data.

Interestingly, SDXL handled some negation constraints better than FLUX. For the color test ("no blue sky"), SDXL creatively stylized the image to avoid the problem entirely. This suggests SDXL may be better trained on negation handling—but it still failed on most constraint types.

SDXL Results Summary

Constraint Type	Negation	Affirmative	Winner
Color	✅ Stylized (avoided blue)	✅ Gray sky	Tie
Object	❌ Fruit bowl appeared	✅ Clean table	Affirmative
Attribute	❌ Orange cat appeared	✅ Gray tabby	Affirmative
Counting	❌ Multiple people	✅ Single figure	Affirmative
Spatial	❌ Trees everywhere	✅ Open field	Affirmative
Weather	✅ Overcast	✅ Overcast	Tie

Figure 3: SDXL Results. SDXL showed better negation handling than FLUX (note the stylized car image avoiding blue sky), but still failed on most constraint types. Affirmative framing won or tied every test.

Affirmative framing won 4 tests, tied 2, and lost none.

💡 Even with a better-trained model like SDXL, affirmative framing never loses. It either wins or ties. This makes it the safer, more reliable choice regardless of which model you're using.

Bonus Finding: Negative Prompt Fields Don't Fully Solve This

I also tested using FLUX's negative prompt feature—putting affirmative language in the main prompt and forbidden elements in a separate negative prompt field.

Positive: "A cat with bare head, blue background, centered composition"
Negative: "hat, people, red, accessories, clutter"

Surprisingly, this performed worse than pure affirmative framing. Red elements crept back in (collars, accessories, background elements), and some images even showed party hats.

Figure 4: Even with forbidden elements in a dedicated negative prompt field, red accessories appeared in most images. The negative prompt still activates the forbidden concepts.

The takeaway: Even purpose-built negative prompt features can't fully escape the negation problem. Pure affirmative framing remains the most reliable approach.

Unexpected Finding: The Gemini Automation Failure

This is where the story gets interesting.

I decided to automate the experiment. Why manually write affirmative framings when I could have an LLM generate them?

I built a simple app that asked Gemini Pro 3 to generate test conditions. For the affirmative framing condition, I specified:

"Generate an affirmative framing that reframes the constraint into positive instruction, focusing on what TO include rather than what to avoid."

Gemini reframed the negative constraint "no red" by focusing on "non-red colors" and "colors other than red."

It still used negation.

"Colors other than red" is negation—just rephrased. The model never escaped the negation frame.

I tried again, more explicitly:

"CRITICAL: Do NOT mention red or any excluded colors. Only specify colors that ARE allowed. Use positive language only."

Gemini still generated prompts using "colors other than red."

It failed twice. Only manual rewriting produced pure affirmative language:

"Describe a colorful scene using vibrant blues, electric greens, bright yellows, warm oranges, deep purples, and cool silvers."

This automation failure is itself a major finding: Even advanced language models struggle to generate pure affirmative framing. Models are trained on human language, and human language defaults to negation.

Practical Rules for Better Prompts

Based on these findings, here are concrete rules for writing better prompts:

Rule 1: Never Use Negation in Constraints

Instead of:

"Don't include people in the background, don't use harsh lighting, avoid reflections"

Use:

"Show only the subject. Use soft, diffused lighting. Keep surfaces matte and non-reflective."

Rule 2: Be Specific About What IS Present

Weak:

"A blue background"

Strong:

"A vivid, saturated blue background occupying 80% of the frame, gradient from bright blue at top to deeper blue at bottom"

Rule 3: List Desired Elements Explicitly

Weak:

"A professional photo without amateur mistakes"

Strong:

"A professional product photo with: sharp focus on the product, even studio lighting, neutral background, shallow depth of field, natural colors"

Rule 4: Use Positive, Action-Oriented Language

Don't	Do
"Avoid corporate jargon"	"Use clear, simple vocabulary"
"Don't make it dark"	"Use bright lighting"
"Without unnecessary details"	"Include only essential information"

What This Reveals About How Models Work

Models process language the way they were trained to: like humans do. That's actually the problem.

When you write "don't include red," the model processes it the same way your brain does—by first activating the concept of "red" to understand what to avoid. For humans, this conscious activation is easy to suppress. For models, that activation becomes part of the output.

The difference isn't that models think differently. It's that models can't consciously decide to ignore an activated concept the way you can. They generate based on what's most salient in their processing. And when you mention red—even to forbid it—you've made red salient.

When you write "include blue and green," there's no competing concept to suppress. The model simply processes what you asked for.

This is why affirmative framing works: it removes the conflicting activation entirely.

The Automation Failure: A Cautionary Note

The fact that Gemini struggled to generate pure affirmative framing matters. When I asked it to reframe, it understood the task but couldn't do it. It kept generating "colors other than red" instead of just listing the colors to use.

This reveals something important: Affirmative framing is not the model's default behavior.

Models learn from human language. Human language defaults to negation. So when you ask a model to generate affirmative instructions, you're asking it to do something contrary to its training.

The solution? Be explicit about what you want. Show examples. Specify the structure. Don't assume the model knows what affirmative framing means—teach it.

Conclusion

Stop fighting against how AI models process language. Speak their language: be direct, specific, and always frame instructions positively.

The results speak for themselves:

From 0% to 100% success rate
Perfect consistency instead of total failure
Validated across multiple models (FLUX and Stable Diffusion XL)
Works across constraint types (color, objects, attributes, spatial, counting)

Next time you write a prompt, forget about what you don't want. Focus on what you do. Be specific. Be direct. Be affirmative.

The model will understand.

Agentic Bitcoin24

Nadine — Sat, 08 Nov 2025 22:22:01 +0000

This is a submission for the Agentic Postgres Challenge with Tiger Data

💡 What I Built

I built Agentic Bitcoin24, a Bitcoin price tracker that never goes down, even when its primary data source fails. It's a growing database that gains value over time.

Live Application: Agentic Bitcoin24

🎯 Zero-Downtime Resilience

When the CoinGecko API fails (rate limits, outages, network issues), the site automatically falls back to Tiger Data's TimescaleDB cache. Users never see an error (they don't even know the switch happened).

Key Benefits:

🎯 Zero Downtime - Site stays live during external API outages
💰 0.31% API Usage - Only 31 calls per month vs 10,000 limit
⚡ Instant Response - Tiger Data cache = no external API latency
🔄 Transparent Fallback - Users are unaware of the data source switch
📈 10-Year Sustainability - Will run for the next decade on free tier

🛢️ How I Used Agentic Postgres

Behind the scenes, three autonomous agents manage the entire database lifecycle - no manual SQL required.

🎬 The Agent Collaboration Model

Agent	Responsibility	Actions
1. Design Agent	Agnostic database design and ingestion.	• Reads external API response and automatically designs a matching SQL schema. • Creates general-purpose tables (e.g., standard SQL or JSONB) based on user input.
2. Optimize Agent	Transforms and tunes existing database.	• Analyzes the Design Agent's generic schema for time-series patterns. • Enables TimescaleDB compression and implements automated compression policies. Safety Protocol: • Applies changes like indexing or compression policies only after visual confirmation and user approval.
3. Monitoring Agent	Gathers database metrics.	• Real-time API health checks. • Performance monitoring and visualization.

The agents autonomously:

Monitor API health in real-time
Switch tabs (SQL Editor → Charts → API Monitor)
Execute optimizations (indexing, compression)
Visualize results (Chart.js dashboards)
Provide safety guidance before applying changes

🏗️ The Workflow:

Daily Ingestion (Vercel Cron)

1. Fetch 24 hours of Bitcoin price data (1 API call)
2. Design Agent creates/updates schema automatically
3. Optimize Agent analyzes and tunes performance
4. TimescaleDB compression stores historical record

Real-Time Monitoring

CoinGecko API Health Check (every 30s)
   ↓
✅ ONLINE  → Fetch fresh data
❌ OFFLINE → Automatic fallback to Tiger Data cache
   ↓
Zero downtime for users

🛢️ How I Used Tiger Data + Claude

I used Tiger CLI (MCP) + Claude Code to build the entire system without writing manual SQL:

Tiger CLI helped agents learn TimescaleDB-specific operations (converttohypertable, add_compression)
Claude Code refined the createzerocopyfork logic and intelligent fallback strategies
The agents operate in a chat interface where I can say: "Create a database for Bitcoin prices" and watch them work

Constraint-Aware Optimization

The Optimize Agent maximizes TimescaleDB's compression capabilities through deep reasoning about storage efficiency:

Automatically enables compression with proper time-column ordering
Implements compression policies (auto-compress data older than 30 days)
Projects long-term capacity and recommends optimizations

When resource constraints prevent certain operations, the agent intelligently adapts by requiring user validation, ensuring all storage optimizations are reviewed before execution.

📈 The 10-Year Sustainability Model

The Math:

Free tier: 10,000 API calls/month
My usage: 31 calls/month (0.31%)
Sustainability: 322 months = 26+ years

Why 10+ Years:

With TimescaleDB compression enabled on the time-series data:

Daily Bitcoin prices (24 hourly data points) = ~2KB per day
Compressed storage: ~730KB per year
750MB ÷ 730KB/year ≈ 1,027 years of compressed data

But realistically, accounting for:

Schema overhead
Indexes and metadata
Query logs
Potential data expansion

Conservative estimate: 10+ years of continuous operation without hitting storage limits.

🌟 Overall Experience

Most apps fail gracefully, this one doesn't fail at all.
We solved the data volatility problem by providing clean, 24-hour historical Bitcoin data, not by collecting data 24/7, but by ingesting 24 hourly data points every 24 hours.

The system is safe to run indefinitely and will store relevant data for 10+ years while costing nothing to maintain.

I basically hired agents who work for free and never sleep! 🎉

How I Built a Secret Agent

Nadine — Sat, 25 Oct 2025 15:59:48 +0000

I recently made an accidental but interesting discovery while building an app. I managed to create an agent-like system using nothing more than Gemini's function calling feature, effectively building an agent’s brain without the traditional, continuous infrastructure required to host a full agent.

The key finding❓ This $0/hr serverless approach not only significantly reduced infrastructure costs but also proved to be a far more helpful debugger than the broad, general-purpose agent provided by my IDE.

֎ Persistent Agents

Traditional AI agents (which I call Persistent Agents) require continuous hosting using managed services and underlying infrastructure. Big tech companies are offering impressive designer spaces and no-code interfaces, but this can quickly become prohibitively expensive.

The issue lies in the idle cost. Immediately upon deployment, infrastructure is required to host the agent. Even if the agent is inactive or receiving no traffic, at least one compute node is required to run the service, and these costs are incurred continuously, often hourly.

So what exactly does this buy you, anyway?

A persistent agent is generally equipped with tools and can use them to perform:

Complex, multi-step reasoning.
Dynamic decision-making on when and how to call tools.
Management of long-running conversational memory.
External actions, like authenticating on your behalf (when permission is granted).

🛠️ Function Calling as Your Agent

I realised that for my application's specific workflows, the most valuable part of an agent was its dynamic reasoning and ability to use tools and not its continuous hosting status and I had no need for external activities.

I decided to capture the core functionality of an agent without the overhead of continuous deployment. I applied tool-use logic directly via Gemini’s function calling. The tools themselves, including the logic for search, retrieval, etc., are hardcoded into my conversational frontend.

The AI's role becomes the Stateless Agent 🧠. It uses function calling to translate the user’s natural language query into a structured function call and arguments.

The application executes the call, and the resulting data is sent back to the model for a natural language response to the user.

Since I am already making calls to the Gemini model for text generation and other things, this method allows me to combine the reasoning and response steps into a single API call, reducing the transaction cost. This is how I anticipate achieving an 80% reduction in operating costs compared to maintaining a persistent agent infrastructure.

🪲 How I Discovered My Agent

My application is designed to fall back to a fuzzy text-matching search when vector search fails. I was coding in my IDE with a popular code assistant model running. Yet, my search pipeline was failing, and the IDE agent could not find the issue. It was writing new unit tests that were passing in the development environment but failing repeatedly in production.

The agent was overcomplicating things, drowning in the specifics of the code, unit tests, and the immediate task. Each time I summarised the issue, its lack of persistent memory about the operational environment made it feel like I was talking to a blank slate.

Finally, in sheer desperation, I ran my own application’s frontend and typed into the message input: “What is the problem??”

The response from my little agent's brain was immediate and shockingly direct. It informed me that it could not communicate with the backend and, therefore, could not perform the search function it was supposed to execute.

The issue, it turned out, was a simple CORS policy error preventing the backend from communicating with the frontend. The traditional IDE agent was trapped in code complexity; my function-calling agent could immediately identify what was wrong.

🔒 The Security Lesson in Focus

This unexpected diagnostic capability is actually due to its architectural limitations. The agent was forced to reason only about the predefined tool functions available in its system instructions.

I then asked it how it was performing the search. It began referencing internal file paths and implementation details. This was an unintended data leak because I had not provided specific instructions or response settings on how to constrain its reply.

That’s the real value of the Stateless Agent: it lives intrinsically inside the code's purpose, defined solely by the functions it is permitted to use. It doesn't need vast context; it needs focused context.

The biggest takeaway from this experiment is that tooling isn't a massive, stateful "IDE Agent" that watches your every keystroke. Instead, there is value in composing stateless, focused expert agents that live intrinsically inside the purpose of the code.