Sarah Lindauer for Stream

Posted on Dec 16, 2025 • Originally published at getstream.io

Build a Realtime Video Restyling Agent with Gemini 3 + Decart AI

#gemini #decart #visionai #visionagent

Google's Gemini 3, released November 18, 2025, gives you multimodal reasoning and tool-use for building response-accurate AI applications. Let's combine it with Decart AI and other leading LLM services to turn casual voice commands into artistic live video style changes, no extra scaffolding required.

Pair it with Decart AI's Mirage LSD, the first live-stream diffusion model for zero-latency video restyling at 24 FPS and <40ms per frame, and you can build an agent that instantly applies artistic styles (Neon nostalgia, Studio Ghibli, Cyberpunk) to your camera feed based on voice prompts.

Combining the two with speech recognition (STT) and speech synthesis (TTS) models, you can spin up a real-time demo that turns your webcam into an infinite, temporally coherent art generator in under five minutes.

In this demo, the agent restyles the live camera feed from "Neon Nostalgia" to "Studio Ghibli" to "War Zone" in response to voice commands, all with seamless, real-time transitions and no lag.

Here's exactly how to build the same agent yourself. You may also watch this step-by-step YouTube tutorial to create the demo in under 9 minutes.

What You'll Build

In just a few minutes, create a real-time video restyling agent that transforms your camera feed into artistic styles via voice prompts.

The stack:

Powered by Gemini 3 Pro (via Google API) for prompt understanding and agentic control
Video processing → Decart AI (Mirage LSD for zero-latency restyling)
Speech-to-text (STT) → DeepGram
Text-to-speech (TTS) → ElevenLabs
Real-time audio/video transport → Stream
Built with the open-source Vision Agents framework

Requirements (API Keys)

You'll need API keys from:

Stream (WebRTC for low-latency transport)
Google (for Gemini 3 access)
Decart AI (video restyling API)
ElevenLabs (TTS)
DeepGram (STT)

Step 1: Set Up Python the Project

#  Initialize  a  Python  project
uv  init  realtime-video-restyling
cd  realtime-video-restyling

#  Activate  your  environment 
uv  venv  &&  source  .venv/bin/activate

#  Install  Vision  Agents  and  required  plugins
uv  add  vision-agents
uv  add  "vision-agents[getstream,  gemini,  elevenlabs,  deepgram]"

#  Install  Decart  AI  with  uv  and  pip
uv  pip  install  vision-agents-plugins-decart

Step 2: Full Working Code (main.py)

In the root of your generated uv project, substitute the content of main.py with the following sample code listing.

import  logging

from  dotenv  import  load_dotenv

from  vision_agents.core  import  User,  Agent,  cli
from  vision_agents.core.agents  import  AgentLauncher
from  vision_agents.plugins  import  decart,  getstream,  gemini,  elevenlabs,  deepgram

logger  =  logging.getLogger(__name__)

load_dotenv()

async  def  create_agent(**kwargs)  ->  Agent:
    processor  =  decart.RestylingProcessor(
        initial_prompt="Change the video style to a cute animated movie with vibrant colours",  model="mirage_v2"
    )
    llm  =  gemini.LLM(model="gemini-3-pro-preview")
    agent  =  Agent(
        edge=getstream.Edge(),
        agent_user=User(name="Story teller",  id="agent"),
        instructions="You will use the Decart processor to change the style of the video and the user's background. ",
        llm=llm,
        tts=elevenlabs.TTS(voice_id="N2lVS1w4EtoT3dr4eOWO"),
        stt=deepgram.STT(),
        processors=[processor],
    )

    @llm.register_function(
        description="This function changes the prompt of the Decart processor which in turn changes the style of the video and user's background"
    )
    async  def  change_prompt(prompt:  str)  ->  str:
        await  processor.update_prompt(prompt)
        return  f"Prompt changed to {prompt}"

    return  agent

async  def  join_call(agent:  Agent,  call_type:  str,  call_id:  str,  **kwargs)  ->  None:
    """Join the call and start the agent."""
    # Ensure the agent user is created
    await  agent.create_user()
    # Create a call
    call  =  await  agent.create_call(call_type,  call_id)

    logger.info("🤖 Starting Agent...")

    # Have the agent join the call/room
    with  await  agent.join(call):
        logger.info("Joining call")
        logger.info("LLM ready")

        await  agent.finish() # Run till the call ends

if  __name__  ==  "__main__":
    cli(AgentLauncher(create_agent=create_agent,  join_call=join_call))

Step 3: Run It

Execute the following commands in your Terminal to store the required API credentials and run the Python script. You may also add the API keys to a .env file in your project's root.

export  GOOGLE_API_KEY=your_key
export  DECART_API_KEY=your_key
export  ELEVENLABS_API_KEY=your_key
export  DEEPGRAM_API_KEY=your_key
export  STREAM_API_KEY=your_key
export  STREAM_API_SECRET=your_secret

cd  realtime-video-restyling
uv run  main.py

A browser tab opens with a video call interface that automatically joins you. You can now go ahead and allow camera/mic access, and say "Make my video Studio Ghibli" and watch your camera feed transform live!

Example interaction from the video:

You: "Make it Neon Nostalgia."
Agent: "OK, I've updated the video style to Neon Nostalgia."
You: "Make it a War Zone."
Agent: "OK, I've updated the video style to a War Zone."

What Makes This Stack So Powerful

This stack is one of the fastest ways for developers to ship a fully-featured, low-latency video AI agent, all in pure Python and under 100 lines.

Vision Agents with integrated voice AI models abstract away turn detection, streaming, and interruption handling. Google's Gemini 3 brings agentic reasoning for prompt interpretation; and Decart's production-proven API delivers <40ms restyling without coherence loss.

It's open-source, local-first (except API calls), and scalable from prototype to production.

Links & Resources

Give it a spin and see what wild style you like best. Maybe... post-apocalyptic Paris or Van Gogh's starry night? 🎨

DEV Community