Imagine walking down the street, asking your glasses what kind of plant you're looking at, and getting a response in near real-time. With the combination of Gemini Live API, LiveKit, and Meta Wearables SDK, this isn't science fiction anymore, it's something you can build today.
In this post, we’ll walk through how to set up a vision-enabled AI agent that connects to Meta Ray-Ban glasses via a secure WebRTC proxy.
The Architecture
The setup involves several layers to ensure low-latency, secure communication between the wearable device and the AI:
- Meta Ray-Ban Glasses: Capture video and audio, connecting via Bluetooth to your phone.
- Phone (Android/iOS): Acts as the gateway, connecting via WebRTC to LiveKit Cloud.
- LiveKit Cloud: Serves as a secure, high-performance proxy for the Gemini Live API.
- Gemini Live API: Processes the stream via WebSockets, enabling real-time multimodal interaction.
The Backend: Building the Gemini Live Agent
We use the LiveKit Agents framework to act as a secure WebRTC proxy for the Gemini Live API. This agent joins the LiveKit room, listens to the audio, and processes the video stream from the glasses.
Setting up the Assistant
The core of our agent is the AgentSession. We use the google.beta.realtime.RealtimeModel to interface with Gemini. Crucially, we enable video_input in the RoomOptions to allow the agent to "see."
@server.rtc_session()
async def entrypoint(ctx: JobContext):
ctx.log_context_fields = {"room": ctx.room.name}
session = AgentSession(
llm=google.beta.realtime.RealtimeModel(
model="gemini-2.5-flash-native-audio-preview-12-2025",
proactivity=True,
enable_affective_dialog=True
),
vad=ctx.proc.userdata["vad"],
)
await session.start(
room=ctx.room,
agent=Assistant(),
room_options=room_io.RoomOptions(
video_input=True,
)
)
await ctx.connect()
await session.generate_reply()
By setting video_input=True, the agent automatically requests the video track from the room, which in this case is the 1FPS stream coming from the glasses.
Running the Agent
To start your agent in development mode and make it accessible globally via LiveKit Cloud, simply run:
uv run agent.py dev
Find the full Gemini Live vision agent example in the LiveKit docs.
Connection & Authentication
To connect your frontend to LiveKit, you need a short-lived access token.
CLI Token Generation
For testing and demos, you can quickly generate a token using the LiveKit CLI:
lk token create \
--api-key <YOUR_API_KEY> \
--api-secret <YOUR_API_SECRET> \
--join \
--room <ROOM_NAME> \
--identity <PARTICIPANT_IDENTITY> \
--valid-for 24h
In a production environment, you should always issue tokens from a secure backend to keep your API secrets safe.
The Frontend: Meta Wearables Integration
This example targets Android devices (like the Google Pixel). You'll need the Meta Wearables Toolkit and the specific sample project.
- Clone the Sample: Get the Android client example.
- Configure local.properties: Add your GitHub Token as required by the Meta SDK.
-
Update Connection Details: In
StreamScreen.kt, replace the server URL and token with your LiveKit details:
// streamViewModel.connectToLiveKit
connectToLiveKit(
url = "wss://your-project.livekit.cloud",
token = "your-generated-token"
)
- Run the App: Connect your device via USB and deploy from Android Studio.
Conclusion
By bridging Meta Wearables with Gemini Live via LiveKit, we've created a powerful, low-latency vision AI experience. This architecture is scalable and secure, providing a foundation for the next generation of wearable AI applications.
Resources
Happy hacking! 🚀

Top comments (0)