DEV Community

Chaitrali Kakde
Chaitrali Kakde

Posted on

Been building AI agents for a year… but this small script blew my mind🫨👏

Call it luck or skill, but this gave me the best results

The secret? VideoSDK + Gemini Live hands down the best combo for a real-time, talking AI that actually works. Forget clunky chatbots or laggy voice assistants; this setup lets your AI listen, understand, and respond instantly, just like a human.

In this post, we’ll show you step-by-step how to bring your AI to life, from setup to first conversation, so you can create your own smart, interactive agent in no time. By the end, you’ll see why this combo is a game-changer for anyone building real-time AI.

Step 1: Setting Up Your Agent Environment

Let's get started by setting up our development environment.

Prerequisites:

Installation:

First, create a virtual environment and install the necessary packages:

Bash

python3 -m venv venv
source venv/bin/activate
pip install videosdk-agents videosdk-plugins-google python-dotenv
Enter fullscreen mode Exit fullscreen mode

Configuration:

Create a .env file in your project root to store your API keys securely:

VIDEOSDK_AUTH_TOKEN="YOUR_VIDEOSDK_AUTH_TOKEN"
GOOGLE_API_KEY="YOUR_GEMINI_API_KEY"
Enter fullscreen mode Exit fullscreen mode

Step 2: Defining the AI Pipeline with Gemini Live

The RealTimePipeline will intelligently stream audio from the VideoSDK meeting to Gemini, receive the transcribed text, pass it to the LLM for processing, and then stream the generated speech back into the meeting, all with minimal latency.

Step 3: Creating Your Conversational Agent

create a main.py file

import asyncio, os
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig

class MyVoiceAgent(Agent):
    def __init__(self):
        super().__init__(instructions="You are a helpful voice assistant that can answer questions and help with tasks.")
    async def on_enter(self): await self.session.say("Hello! How can I help?")
    async def on_exit(self): await self.session.say("Goodbye!")
async def start_session(context: JobContext):
    # Initialize Model
    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        # When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
        config=GeminiLiveConfig(
            voice="Leda", # Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
            response_modalities=["AUDIO"]
        )
    )

    # Create pipeline
    pipeline = RealTimePipeline(
        model=model
    )

    session = AgentSession(
        agent=MyVoiceAgent(),
        pipeline=pipeline
    )

    try:
        await context.connect()
        await session.start()
        # Keep the session running until manually terminated
        await asyncio.Event().wait()
    finally:
        # Clean up resources when done
        await session.close()
        await context.shutdown()

def make_context() -> JobContext:
    room_options = RoomOptions(
    #   room_id="YOUR_MEETING_ID",  # Set to join a pre-created room; omit to auto-create
        name="VideoSDK Realtime Agent",
        playground=True
    )

    return JobContext(room_options=room_options)

if __name__ == "__main__":
    job = WorkerJob(entrypoint=start_session, jobctx=make_context)
    job.start()
Enter fullscreen mode Exit fullscreen mode

To Run Your Agent:

  1. Save the complete code as main.py.
  2. Run it from your terminal: python main.py
  3. The script will output a VideoSDK Playground URL. Open this URL in your browser.
  4. Join the meeting from your browser, and your Gemini Live-powered AI agent will introduce itself and be ready to converse in real-time!

Step 4: Integrate into a Live Meeting

You can take your AI agent one step further by joining it to a live meeting—just use the same meeting ID, and your agent can start interacting in real time alongside participants

  • Using Javascript : link
  • Using ReactJS: link
  • Using React-Native : link
  • Using Android : link
  • Using Flutter: link
  • Using IOS: link

Conclusion

Congrats! You’ve just built a real-time conversational AI agent using Google’s Gemini Live API and VideoSDK. This combo enables fast, natural, low-latency interactions, taking your project far beyond traditional chatbots.

Whether it’s a virtual assistant, an interactive tutor, or next-gen customer support, the possibilities are endless. The future of conversational AI is real-time, and now you have the tools to make it happen.

💡 We’d love to hear from you!

  • Were you able to set up your first AI voice agent in Python?
  • What challenges did you face while integrating the cascading pipeline?
  • Are you more curious about cascading pipeline or real-time pipeline?
  • How do you envision AI voice assistants transforming customer experiences in your business?

👉 Share your thoughts, hurdles, or success stories in the comments, or join our Discord community ↗. We can’t wait to learn from your journey and help you build even smarter, AI-powered communication tools!

Top comments (1)

Collapse
 
hng_c_7b5ae2d157b44731 profile image
hihi

🤖 AhaChat AI Ecosystem is here!
💬 AI Response – Auto-reply to customers 24/7
🎯 AI Sales – Smart assistant that helps close more deals
🔍 AI Trigger – Understands message context & responds instantly
🎨 AI Image – Generate or analyze images with one command
🎤 AI Voice – Turn text into natural, human-like speech
📊 AI Funnel – Qualify & nurture your best leads automatically