This is a submission for the Google AI Studio Multimodal Challenge
What I Built
Teleglot is a next-generation meeting productivity platform that acts as an intelligent participant in video calls. It solves the universal problems of unproductive meetings: lack of engagement for non-native speakers, unclear outcomes, and the tedious task of note-taking. Teleglot provides real-time transcription, AI-powered summarization, live translation for global teams, and an AI co-pilot that offers the host real-time, private suggestions to guide the conversation toward a productive conclusion. It transforms passive meetings into active, actionable, and inclusive collaboration sessions.
Demo
Screenshots / Video:
Link to Video
- A meeting with live transcription and translation toggles.
-
The AI Co-Pilot sending a real-time suggestion to the host.
The final summary and action items being generated and displayed at the end.
The live meeting interface showing real-time transcription and language translation options.
The detailed meeting summary and extracted action items generated automatically after the call.
How I Used Google AI Studio
Teleglot is built entirely on the powerful multimodal capabilities of Google AI Studio and the Gemini API. The application leverages Gemini as its core AI engine for understanding and generating content:
Gemini 2.5 Flash(gemini-2.5-flash): Used for its low-latency performance to power the real-time transcription and live translation features, ensuring smooth, conversational-speed processing.
Live API (Gemini 2.5 Flash Live Preview): This critical feature enables the Real-time Co-pilot functionality. It maintains a stateful, streaming conversation with the meeting host, allowing the AI to analyze the live transcript and provide contextual, private suggestions without interrupting the flow of the meeting.
The prompts were meticulously engineered within AI Studio to ensure structured JSON outputs, making the integration with our backend seamless and reliable.
Multimodal Features
Teleglot's power comes from its deep integration of multiple modalities, creating a cohesive and intelligent user experience:
Audio + Text Understanding (Live Transcription & Translation): This is the foundation. Teleglot processes raw audio into text (speech-to-text) and then uses Gemini's NLP capabilities to translate that text into multiple languages in real-time, breaking down language barriers instantly.
Audio + Text Understanding (Post-Meeting Analysis): This is the most advanced feature. Using Gemini 2.5 Pro, Teleglot doesn't just analyze the transcript. It processes the full meeting recording—audio, video, and any shared screen content—to achieve a human-level understanding of the context. This allows it to:
Identify key decisions based on visual cues like slides and verbal agreement.
Accurately assign action items by understanding who volunteered for a task.
Gauge overall meeting sentiment more effectively than text-alone analysis.
Real-Time Text Analysis (AI Co-Pilot via Live API): This feature enhances the live meeting experience. The Co-Pilot continuously analyzes the live text transcript (a fusion of audio understanding and text generation) to act as a strategic partner for the host. It provides suggestions like "The team has spent 10 minutes on this topic. Suggest putting it in the parking lot?" or "Maria asked a question that wasn't fully answered." This turns the AI from a passive tool into an active facilitator.
Why it enhances the experience: By combining these modalities, Teleglot moves far beyond a simple note-taker. It creates a meeting environment that is more inclusive (live translation), more efficient (automatic summaries), and more guided (AI Co-Pilot), ultimately ensuring that time spent in meetings is productive and actionable for everyone involved.
Top comments (0)