DEV Community

Nikolaos Georgios Ntaiko
Nikolaos Georgios Ntaiko

Posted on

How I Built "Sous-Chef": A Voice-First Cooking Assistant using Gemini Live API

This past month, I found myself in the kitchen baking almost every week. It was a rewarding process, but I quickly hit a recurring frustration: my hands were constantly covered in flour and dough, creating a messy cycle of washing and drying just to check a recipe on my phone or a piece of paper. I created Sous-Chef to solve this friction. It allows me to stay fully immersed in the joy of baking for the people I care about, ensuring every batch is perfect without the need to fight with a screen. Beyond solving a personal pain point, I built Sous-Chef to push the boundaries of real-time interaction for the Gemini Live Agent Challenge.

The core of the experience relies on the Gemini Live API via Vertex AI, which taught me that the traditional request-response model is insufficient for truly helpful agents. By leveraging WebSockets, Sous-Chef handles a continuous, bidirectional stream of audio and video. I spent significant time optimizing the communication between the Angular frontend and the FastAPI backend to ensure the audio felt as immediate as a human conversation.

A voice that just recites text is a podcast, but a chef that can manage your kitchen is a true assistant. To bridge this gap, I used Gemini’s Function Calling to ground the agent in reality. I implemented a robust tool-calling layer that connects the model to a local timer service and the Google Places API. This means that if I realize I am out of basil, the agent doesn't just sympathize, it can actually find the nearest open grocery store and provide directions.

To push the multimodal aspect further, I integrated a vision tool that allows the agent to "see" through the user's camera. This transforms the interaction from a simple voice chat into a context-aware collaboration where I can show the agent the ingredients on my counter and ask for suggestions based on what is physically there. To ensure this was all scalable and reproducible, I automated the entire deployment process to Google Cloud Run using custom shell scripts. Hosting the backend on GCP ensures that the high-bandwidth requirements of live video and audio are handled with the reliability needed for a fast-paced kitchen environment

You can check the project here

Top comments (0)