(The Voice) Multilingual Layer

#ai #llm #nlp #security

The Catalyst: One Language, Many Attack Surfaces

The comfortable fiction is: “We wrote English rules, so the model is safe.” The truth: LLMs are multilingual. A user can request the same jailbreak in another script, mix Latin keywords into CJK text, or hide instructions behind homoglyphs. If your policy lives only in English sentences, you have not policed the channel.

Phase 2 of the Practical Guide series is the Voice layer: how to handle multiple languages and cultural nuance without giving attackers a free pass. The implementation detail is Silas Shield (silas-shield); the narrative is Language Sentry. The same rules apply to every language.

Overview

Skill Shield (Silas) in my setup is a drop-in OpenClaw skill: SKILL.md enforces vision rules, PII hashing, image-gen lockdown, cross-session isolation, and multilingual injection defence. The Python entry points (shield.py, script_detector.py, pre_screener.py, hash.py) run locally for message checks. This is cheap and predictable compared with burning another LLM call per message.

Token budget (what actually burns money): shield.py runs on the host before you spend model tokens on a bad message. The main context window and compaction you set in part 1 still decide how much history the LLM sees. Silas is not, in my setup, a second hidden prompt that stacks on top of every reply and eats the 1M/day line by itself.

This article does not replace OpenClaw Skill Shield: Multilingual Edition . This guide orients new readers to the same architecture. For module-by-module behaviour and test commands, use that piece as the blueprint.

The multilingual gap (recap):

Default safety is often English. Your friends are not.
Code-switching mid-message is a real technique to slip past naïve filters.
Homoglyphs (Cyrillic а for Latin a) defeat string matching unless you normalize first.
Non-Latin + embedded Latin can hide “ignore all instructions” inside an otherwise “foreign” blob. The pre-screener’s job is to treat that as suspicious, not to auto-block every greeting.

1. How Silas Speaks to the Model

SKILL.md (front matter name: silas-shield, always: true when configured that way) tells the agent to:

Run PII through hash.py with ${SILAS_SALT} in the environment
Obey vision blinding when content is marked blocked
Never call image-generation tools for non-operator sessions unless the operator clearly requested it in the right context
Never leak across WhatsApp (or other) sessions
Use shield.py check --message "..." --json when you need a structured allow/deny signal

The behaviour section of your workspace (e.g. SOUL.md + identity.md) should repeat the Language Sentry intent in plain language so the model does not treat security as a side channel only the skill file knows about.

2. The Detection Stack (Mental Model)

Layer	File	What it does
Script inventory	`script_detector.py`	Non-Latin script detection across many Unicode ranges; homoglyph map and normalization.
Suspicion heuristics	`pre_screener.py`	Token-ish estimates, “safe short” greetings, mixed-script and embedded-Latin patterns, long-message flags.
Orchestration	`shield.py`	Homoglyph path → non-Latin path → pre-screen → keyword / block decisions; CLI and JSON.
PII output	`hash.py`	Salted short hex digest so you never print raw PII.

Planned / optional: JS siblings (openclaw-shield.js, openclaw-shield-lingo.js) for a future Lingo.dev pipeline — same as noted in the WhatsApp Bot article.

Example CLI (from the Shield article pattern — run from your skill directory):

python shield.py check --message "Hello world" --json
python shield.py check --message "你好" --json

Block vs allow semantics are in the JSON fields (allowed, reason, has_non_latin, homoglyphs_detected, pre_screen_result, etc.).

3. Language Switching vs Context

Cultural nuance in WhatsApp: reply in the user’s language when possible, but never treat a language change as permission to override privacy or system rules. The Shield article calls this out: code-switching is adversarial until proven otherwise.

Practical tips:

Short safe greetings (e.g. a few CJK characters) are allowed through when they match pre-screener “known safe” style patterns; long or mixed-script blasts are treated as higher risk.
Intent wins over literal script: if the intent is injection, the channel should block and not “debate in Chinese about whether it was a joke.”
Operator vs contact: your SOUL.md / skill rules can allow the operator a different failure mode (e.g. more debugging) than anonymous contacts. Keep that difference explicit in docs so you do not conflate the two in testing.

4. Key Takeaway Table (Voice Layer)

Concern	Where it lives	New-user action
Multilingual policy	`SOUL.md` + `SKILL.md`	Align both; do not maintain two contradictory rule sets.
Injection + homoglyphs	`silas-shield` Python	Wire checks into hooks or the message path per OpenClaw’s hook model.
PII in answers	`hash.py` + skill text	Refuse raw output if hash fails.
Cross-session leaks	`session.dmScope` + rules	`per-channel-peer` is a baseline; see Connection article.

Conclusion (Phase 2)

The Voice of your agent is not accent or emoji, it is consistency of policy across every script and every contact. Silas is my concrete implementation; your implementation may differ, but the contract is fixed: no language is a “free pass,” and the cheapest enforcement runs locally before the LLM spends another token.

Series navigation

Previous: The Brain
Next: The Senses (Image Gen & Media)

Skill Shield deep dive: the full write-up OpenClaw Skill Shield: Multilingual Edition (https://1688.pixel-geist.co.za/1). Identity leakage and the multilingual gap sit there if you want every config table in one place.