This past month, I found myself in the kitchen baking almost every week. It was a rewarding process, but I quickly hit a recurring frustration: my hands were constantly covered in flour and dough, creating a messy cycle of washing and drying just to check a recipe on my phone or a piece of paper. I created Sous-Chef to solve this friction. It allows me to stay fully immersed in the joy of baking for the people I care about, ensuring every batch is perfect without the need to fight with a screen. Beyond solving a personal pain point, I built Sous-Chef to push the boundaries of real-time interaction for the Gemini Live Agent Challenge.
The core of the experience relies on the Gemini Live API via Vertex AI, which taught me that the traditional request-response model is insufficient for truly helpful agents. By leveraging WebSockets, Sous-Chef handles a continuous, bidirectional stream of audio and video. I spent significant time optimizing the communication between the Angular frontend and the FastAPI backend to ensure the audio felt as immediate as a human conversation.
A voice that just recites text is a podcast, but a chef that can manage your kitchen is a true assistant. To bridge this gap, I used Gemini’s Function Calling to ground the agent in reality. I implemented a robust tool-calling layer that connects the model to a local timer service and the Google Places API. This means that if I realize I am out of basil, the agent doesn't just sympathize, it can actually find the nearest open grocery store and provide directions.
To push the multimodal aspect further, I integrated a vision tool that allows the agent to "see" through the user's camera. This transforms the interaction from a simple voice chat into a context-aware collaboration where I can show the agent the ingredients on my counter and ask for suggestions based on what is physically there. To ensure this was all scalable and reproducible, I automated the entire deployment process to Google Cloud Run using custom shell scripts. Hosting the backend on GCP ensures that the high-bandwidth requirements of live video and audio are handled with the reliability needed for a fast-paced kitchen environment
You can check the project here

Top comments (0)