DEV Community

Cover image for Realtime Multimodal AI on Ray-Ban Meta Glasses with Gemini Live & LiveKit
Thor 雷神 Schaeff for Google AI

Posted on

Realtime Multimodal AI on Ray-Ban Meta Glasses with Gemini Live & LiveKit

Imagine walking down the street, asking your glasses what kind of plant you're looking at, and getting a response in near real-time. With the combination of Gemini Live API, LiveKit, and Meta Wearables SDK, this isn't science fiction anymore, it's something you can build today.

In this post, we’ll walk through how to set up a vision-enabled AI agent that connects to Meta Ray-Ban glasses via a secure WebRTC proxy.

The Architecture

The setup involves several layers to ensure low-latency, secure communication between the wearable device and the AI:

  1. Meta Ray-Ban Glasses: Capture video and audio, connecting via Bluetooth to your phone.
  2. Phone (Android/iOS): Acts as the gateway, connecting via WebRTC to LiveKit Cloud.
  3. LiveKit Cloud: Serves as a secure, high-performance proxy for the Gemini Live API.
  4. Gemini Live API: Processes the stream via WebSockets, enabling real-time multimodal interaction.


The Backend: Building the Gemini Live Agent

We use the LiveKit Agents framework to act as a secure WebRTC proxy for the Gemini Live API. This agent joins the LiveKit room, listens to the audio, and processes the video stream from the glasses.

Setting up the Assistant

The core of our agent is the AgentSession. We use the google.beta.realtime.RealtimeModel to interface with Gemini. Crucially, we enable video_input in the RoomOptions to allow the agent to "see."

@server.rtc_session()
async def entrypoint(ctx: JobContext):
    ctx.log_context_fields = {"room": ctx.room.name}

    session = AgentSession(
        llm=google.beta.realtime.RealtimeModel(
            model="gemini-2.5-flash-native-audio-preview-12-2025",
            proactivity=True,
            enable_affective_dialog=True
        ),
        vad=ctx.proc.userdata["vad"],
    )

    await session.start(
        room=ctx.room,
        agent=Assistant(),
        room_options=room_io.RoomOptions(
            video_input=True,
        )
    )
    await ctx.connect()
    await session.generate_reply()
Enter fullscreen mode Exit fullscreen mode

By setting video_input=True, the agent automatically requests the video track from the room, which in this case is the 1FPS stream coming from the glasses.

Running the Agent

To start your agent in development mode and make it accessible globally via LiveKit Cloud, simply run:

uv run agent.py dev
Enter fullscreen mode Exit fullscreen mode

Find the full Gemini Live vision agent example in the LiveKit docs.


Connection & Authentication

To connect your frontend to LiveKit, you need a short-lived access token.

CLI Token Generation

For testing and demos, you can quickly generate a token using the LiveKit CLI:

lk token create \
  --api-key <YOUR_API_KEY> \
  --api-secret <YOUR_API_SECRET> \
  --join \
  --room <ROOM_NAME> \
  --identity <PARTICIPANT_IDENTITY> \
  --valid-for 24h
Enter fullscreen mode Exit fullscreen mode

In a production environment, you should always issue tokens from a secure backend to keep your API secrets safe.


The Frontend: Meta Wearables Integration

This example targets Android devices (like the Google Pixel). You'll need the Meta Wearables Toolkit and the specific sample project.

  • Clone the Sample: Get the Android client example.
  • Configure local.properties: Add your GitHub Token as required by the Meta SDK.
  • Update Connection Details: In StreamScreen.kt, replace the server URL and token with your LiveKit details:
// streamViewModel.connectToLiveKit
connectToLiveKit(
    url = "wss://your-project.livekit.cloud",
    token = "your-generated-token"
)
Enter fullscreen mode Exit fullscreen mode
  • Run the App: Connect your device via USB and deploy from Android Studio.

Conclusion

By bridging Meta Wearables with Gemini Live via LiveKit, we've created a powerful, low-latency vision AI experience. This architecture is scalable and secure, providing a foundation for the next generation of wearable AI applications.

Resources

Happy hacking! 🚀

Top comments (0)