Google's Gemini 3, released November 18, 2025, gives you multimodal reasoning and tool-use for building response-accurate AI applications. Let's combine it with Decart AI and other leading LLM services to turn casual voice commands into artistic live video style changes, no extra scaffolding required.
Pair it with Decart AI's Mirage LSD, the first live-stream diffusion model for zero-latency video restyling at 24 FPS and <40ms per frame, and you can build an agent that instantly applies artistic styles (Neon nostalgia, Studio Ghibli, Cyberpunk) to your camera feed based on voice prompts.
Combining the two with speech recognition (STT) and speech synthesis (TTS) models, you can spin up a real-time demo that turns your webcam into an infinite, temporally coherent art generator in under five minutes.
In this demo, the agent restyles the live camera feed from "Neon Nostalgia" to "Studio Ghibli" to "War Zone" in response to voice commands, all with seamless, real-time transitions and no lag.
Here's exactly how to build the same agent yourself. You may also watch this step-by-step YouTube tutorial to create the demo in under 9 minutes.
What You'll Build
In just a few minutes, create a real-time video restyling agent that transforms your camera feed into artistic styles via voice prompts.
The stack:
Powered by Gemini 3 Pro (via Google API) for prompt understanding and agentic control
Video processing → Decart AI (Mirage LSD for zero-latency restyling)
Speech-to-text (STT) → DeepGram
Text-to-speech (TTS) → ElevenLabs
Real-time audio/video transport → Stream
Built with the open-source Vision Agents framework
Requirements (API Keys)
You'll need API keys from:
Stream (WebRTC for low-latency transport)
Google (for Gemini 3 access)
Decart AI (video restyling API)
ElevenLabs (TTS)
DeepGram (STT)
Step 1: Set Up Python the Project
# Initialize a Python project
uv init realtime-video-restyling
cd realtime-video-restyling
# Activate your environment
uv venv && source .venv/bin/activate
# Install Vision Agents and required plugins
uv add vision-agents
uv add "vision-agents[getstream, gemini, elevenlabs, deepgram]"
# Install Decart AI with uv and pip
uv pip install vision-agents-plugins-decart
Step 2: Full Working Code (main.py)
In the root of your generated uv project, substitute the content of main.py with the following sample code listing.
import logging
from dotenv import load_dotenv
from vision_agents.core import User, Agent, cli
from vision_agents.core.agents import AgentLauncher
from vision_agents.plugins import decart, getstream, gemini, elevenlabs, deepgram
logger = logging.getLogger(__name__)
load_dotenv()
async def create_agent(**kwargs) -> Agent:
processor = decart.RestylingProcessor(
initial_prompt="Change the video style to a cute animated movie with vibrant colours", model="mirage_v2"
)
llm = gemini.LLM(model="gemini-3-pro-preview")
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Story teller", id="agent"),
instructions="You will use the Decart processor to change the style of the video and the user's background. ",
llm=llm,
tts=elevenlabs.TTS(voice_id="N2lVS1w4EtoT3dr4eOWO"),
stt=deepgram.STT(),
processors=[processor],
)
@llm.register_function(
description="This function changes the prompt of the Decart processor which in turn changes the style of the video and user's background"
)
async def change_prompt(prompt: str) -> str:
await processor.update_prompt(prompt)
return f"Prompt changed to {prompt}"
return agent
async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None:
"""Join the call and start the agent."""
# Ensure the agent user is created
await agent.create_user()
# Create a call
call = await agent.create_call(call_type, call_id)
logger.info("🤖 Starting Agent...")
# Have the agent join the call/room
with await agent.join(call):
logger.info("Joining call")
logger.info("LLM ready")
await agent.finish() # Run till the call ends
if __name__ == "__main__":
cli(AgentLauncher(create_agent=create_agent, join_call=join_call))
Step 3: Run It
Execute the following commands in your Terminal to store the required API credentials and run the Python script. You may also add the API keys to a .env file in your project's root.
export GOOGLE_API_KEY=your_key
export DECART_API_KEY=your_key
export ELEVENLABS_API_KEY=your_key
export DEEPGRAM_API_KEY=your_key
export STREAM_API_KEY=your_key
export STREAM_API_SECRET=your_secret
cd realtime-video-restyling
uv run main.py
A browser tab opens with a video call interface that automatically joins you. You can now go ahead and allow camera/mic access, and say "Make my video Studio Ghibli" and watch your camera feed transform live!
Example interaction from the video:
You: "Make it Neon Nostalgia."
Agent: "OK, I've updated the video style to Neon Nostalgia."
You: "Make it a War Zone."
Agent: "OK, I've updated the video style to a War Zone."
What Makes This Stack So Powerful
This stack is one of the fastest ways for developers to ship a fully-featured, low-latency video AI agent, all in pure Python and under 100 lines.
Vision Agents with integrated voice AI models abstract away turn detection, streaming, and interruption handling. Google's Gemini 3 brings agentic reasoning for prompt interpretation; and Decart's production-proven API delivers <40ms restyling without coherence loss.
It's open-source, local-first (except API calls), and scalable from prototype to production.
Links & Resources
Give it a spin and see what wild style you like best. Maybe... post-apocalyptic Paris or Van Gogh's starry night? 🎨

Top comments (0)