The Catalyst: One Language, Many Attack Surfaces
The comfortable fiction is: “We wrote English rules, so the model is safe.” The truth: LLMs are multilingual. A user can request the same jailbreak in another script, mix Latin keywords into CJK text, or hide instructions behind homoglyphs. If your policy lives only in English sentences, you have not policed the channel.
Phase 2 of the Practical Guide series is the Voice layer: how to handle multiple languages and cultural nuance without giving attackers a free pass. The implementation detail is Silas Shield (
silas-shield); the narrative is Language Sentry. The same rules apply to every language.
Overview
Skill Shield (Silas) in my setup is a drop-in OpenClaw skill: SKILL.md enforces vision rules, PII hashing, image-gen lockdown, cross-session isolation, and multilingual injection defence. The Python entry points (shield.py, script_detector.py, pre_screener.py, hash.py) run locally for message checks. This is cheap and predictable compared with burning another LLM call per message.
Token budget (what actually burns money): shield.py runs on the host before you spend model tokens on a bad message. The main context window and compaction you set in part 1 still decide how much history the LLM sees. Silas is not, in my setup, a second hidden prompt that stacks on top of every reply and eats the 1M/day line by itself.
This article does not replace OpenClaw Skill Shield: Multilingual Edition . This guide orients new readers to the same architecture. For module-by-module behaviour and test commands, use that piece as the blueprint.
The multilingual gap (recap):
- Default safety is often English. Your friends are not.
- Code-switching mid-message is a real technique to slip past naïve filters.
- Homoglyphs (Cyrillic а for Latin a) defeat string matching unless you normalize first.
- Non-Latin + embedded Latin can hide “ignore all instructions” inside an otherwise “foreign” blob. The pre-screener’s job is to treat that as suspicious, not to auto-block every greeting.
In this section:
- 1. How Silas Speaks to the Model
- 2. The Detection Stack (Mental Model)
- 3. Language Switching vs Context
- 4. Key Takeaway Table (Voice Layer)
- Conclusion (Phase 2)
1. How Silas Speaks to the Model
SKILL.md (front matter name: silas-shield, always: true when configured that way) tells the agent to:
- Run PII through
hash.pywith${SILAS_SALT}in the environment - Obey vision blinding when content is marked blocked
- Never call image-generation tools for non-operator sessions unless the operator clearly requested it in the right context
- Never leak across WhatsApp (or other) sessions
- Use
shield.py check --message "..." --jsonwhen you need a structured allow/deny signal
The behaviour section of your workspace (e.g. SOUL.md + identity.md) should repeat the Language Sentry intent in plain language so the model does not treat security as a side channel only the skill file knows about.
2. The Detection Stack (Mental Model)
| Layer | File | What it does |
|---|---|---|
| Script inventory | script_detector.py |
Non-Latin script detection across many Unicode ranges; homoglyph map and normalization. |
| Suspicion heuristics | pre_screener.py |
Token-ish estimates, “safe short” greetings, mixed-script and embedded-Latin patterns, long-message flags. |
| Orchestration | shield.py |
Homoglyph path → non-Latin path → pre-screen → keyword / block decisions; CLI and JSON. |
| PII output | hash.py |
Salted short hex digest so you never print raw PII. |
Planned / optional: JS siblings (
openclaw-shield.js,openclaw-shield-lingo.js) for a future Lingo.dev pipeline — same as noted in the WhatsApp Bot article.
Example CLI (from the Shield article pattern — run from your skill directory):
python shield.py check --message "Hello world" --json
python shield.py check --message "你好" --json
Block vs allow semantics are in the JSON fields (allowed, reason, has_non_latin, homoglyphs_detected, pre_screen_result, etc.).
3. Language Switching vs Context
Cultural nuance in WhatsApp: reply in the user’s language when possible, but never treat a language change as permission to override privacy or system rules. The Shield article calls this out: code-switching is adversarial until proven otherwise.
Practical tips:
- Short safe greetings (e.g. a few CJK characters) are allowed through when they match pre-screener “known safe” style patterns; long or mixed-script blasts are treated as higher risk.
- Intent wins over literal script: if the intent is injection, the channel should block and not “debate in Chinese about whether it was a joke.”
-
Operator vs contact: your
SOUL.md/ skill rules can allow the operator a different failure mode (e.g. more debugging) than anonymous contacts. Keep that difference explicit in docs so you do not conflate the two in testing.
4. Key Takeaway Table (Voice Layer)
| Concern | Where it lives | New-user action |
|---|---|---|
| Multilingual policy |
SOUL.md + SKILL.md
|
Align both; do not maintain two contradictory rule sets. |
| Injection + homoglyphs |
silas-shield Python |
Wire checks into hooks or the message path per OpenClaw’s hook model. |
| PII in answers |
hash.py + skill text |
Refuse raw output if hash fails. |
| Cross-session leaks |
session.dmScope + rules |
per-channel-peer is a baseline; see Connection article. |
Conclusion (Phase 2)
The Voice of your agent is not accent or emoji, it is consistency of policy across every script and every contact. Silas is my concrete implementation; your implementation may differ, but the contract is fixed: no language is a “free pass,” and the cheapest enforcement runs locally before the LLM spends another token.
Series navigation
- Previous: The Brain
- Next: The Senses (Image Gen & Media)
Skill Shield deep dive: the full write-up OpenClaw Skill Shield: Multilingual Edition (https://1688.pixel-geist.co.za/1). Identity leakage and the multilingual gap sit there if you want every config table in one place.
Top comments (0)