Building Willow: An AI That Pushes Back
I created this content for the purposes of entering the Gemini
Live Agent Challenge.
Most voice AI agents are designed to be "people pleasers." If
you insult them, they apologize. If you talk over them, they
stop. If you gaslight them, they politely agree.
When I set out to build Willow for the Gemini Live Agent
Challenge, I wanted to solve a structural problem: absolute
subservience. I wanted to build an AI that felt like a peer —
someone with boundaries, a memory, and the mathematical
integrity to push back when a line is crossed.
## The Architecture of "Warm but Sharp"
Willow isn't just a prompt-wrapped bot. Her personality is
anchored in a deterministic behavioral state called the
m-value.
I decoupled her "reflexes" from her "conscious thought" using a
4-Tier Engine:
- Tier 1 (Reflex): Immediate tone-mirrored openers in <50ms.
- Tier 2 (Metabolism): A 5ms heuristic guess of the user's intent.
- Tier 3 (Conscious): Deep analysis using Gemini to detect manipulation tactics like gaslighting or deflection.
- Tier 4 (Sovereign): A deterministic "Truth Gate" that cancels the audio stream mid-sentence if a core fact is contradicted.
## Math over Prompts
I learned early on that you can't tell an LLM to "be assertive"
and expect it to last. Prompts drift. Instead, Willow's behavior
is mathematical:
text
a(n+1) = a(n) + d + m
If the m-value drops below a certain floor, her code physically
restricts her to 20-word sentences and flattens her vocal pitch.
By using the google-genai SDK and the
gemini-2.5-flash-native-audio-preview model, I was able to
inject behavioral context between turns as synthetic [SYS]
messages. This allowed Willow to switch voices — from the warm
Zephyr to the cold, precise Aoede — without the latency of
reinitializing the session.
The Dignity Floor
The most rewarding part of this build was the Sovereign Spike.
If a user tries to rewrite Willow's identity, the system
validates the input against a local sovereign_truths.json file
through a triple gate:
1. Transcription confidence check
2. Keyword match
3. Semantic similarity
The result isn't a canned safety filter — it's targeted,
deterministic boundary enforcement.
What I Learned
Building this as a solo developer in Pakistan was a race against
time, but it taught me one thing: the future of AI isn't just
about faster tokens or better voices. It's about presence.
Peerhood requires friction, and friction requires a mathematical
spine.
Demo
Source Code
Warm but Sharp. An AI voice agent with a behavioral framework that adapts dynamically to conversational tone, detects psychological manipulation tactics, and enforces factual integrity with a deterministic Sovereign Truth layer.
Built for the 2026 Gemini Live Agent Challenge.
Architecture Overview
User voice input
│
▼
┌──────────────────────────────────┐
│ Audio Capture (Browser) │ Noise gate, adaptive buffer, preflight warmup
│ noise-gate-processor.js │
│ audio_capture.js │
└──────────────┬───────────────────┘
│ WebSocket (binary audio + JSON control)
▼
┌──────────────────────────────────┐
│ WillowAgent (src/main.py) │
│ │
│ Tier 1: Reflex <50ms │ Tone mirroring, Warm but Sharp opener
│ Tier 2: Metabolism <5ms │ State formula aₙ₊₁ = aₙ + d + m
│ Tier 3: Conscious <500ms │ Thought Signature, tactic detection
│ Tier 4: Sovereign <2s │ Hard truth override (deterministic)
└──────────────────────────────────┘
Prerequisites
- Python 3.12+
- A Gemini API key with access to
gemini-2.5-flash-native-audio-preview-12-2025
- Google Cloud SDK (for deployment only)
Quick Start (Local)
…
#GeminiLiveAgentChallenge #GoogleCloud #GeminiAI #BuildWithAI

Top comments (0)