<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Providence Ifeosame</title>
    <description>The latest articles on DEV Community by Providence Ifeosame (@provydon).</description>
    <link>https://dev.to/provydon</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F266179%2F49d865fc-8997-4b0f-865b-0beeab662759.jpeg</url>
      <title>DEV Community: Providence Ifeosame</title>
      <link>https://dev.to/provydon</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/provydon"/>
    <language>en</language>
    <item>
      <title>How I Built Niobe: An AI Waitress with Gemini Live and Google Cloud</title>
      <dc:creator>Providence Ifeosame</dc:creator>
      <pubDate>Mon, 16 Mar 2026 13:30:29 +0000</pubDate>
      <link>https://dev.to/provydon/how-i-built-niobe-an-ai-waitress-with-gemini-live-and-google-cloud-2o28</link>
      <guid>https://dev.to/provydon/how-i-built-niobe-an-ai-waitress-with-gemini-live-and-google-cloud-2o28</guid>
      <description>&lt;p&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; &lt;em&gt;I created this blog post for the purposes of entering the Gemini Live Agent Challenge hackathon. When sharing on social media, use the hashtag #GeminiLiveAgentChallenge.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Niobe?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Niobe&lt;/strong&gt; is an AI waitress for restaurants. Restaurant owners upload their menu (even as a photo), get a shareable link, and customers can talk to the waitress by voice—ask about the menu, place orders, and have a natural conversation. No app install required; everything runs in the browser.&lt;/p&gt;

&lt;p&gt;I built it using &lt;strong&gt;Google’s Gemini&lt;/strong&gt; (including the &lt;strong&gt;Gemini Live API&lt;/strong&gt; for real-time voice) and &lt;strong&gt;Google Cloud&lt;/strong&gt; for deployment. Here’s how I put it together.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two places where Gemini powers the product
&lt;/h2&gt;

&lt;p&gt;I use Gemini in two different ways:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Gemini API: turning menu images into structured data
&lt;/h3&gt;

&lt;p&gt;Restaurant owners don’t have to type their menu. They upload images (photos of the menu, PDFs, etc.). The &lt;strong&gt;Laravel&lt;/strong&gt; backend sends those images to the &lt;strong&gt;Gemini API&lt;/strong&gt; (via the Laravel AI package with the Gemini driver). Gemini returns structured text and JSON—dishes, categories, prices—and I store that in &lt;strong&gt;PostgreSQL&lt;/strong&gt;. So the “brain” of the waitress (menu + context) is partly built by Gemini from images. This is configured in Laravel with &lt;code&gt;GEMINI_API_KEY&lt;/code&gt; and the default provider in &lt;code&gt;config/ai.php&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="c1"&gt;// config/ai.php&lt;/span&gt;
&lt;span class="s1"&gt;'default'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gemini'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="s1"&gt;'default_for_images'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gemini'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

&lt;span class="s1"&gt;'providers'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s1"&gt;'gemini'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s1"&gt;'driver'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s1"&gt;'gemini'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s1"&gt;'key'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;env&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'GEMINI_API_KEY'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Menu extraction then calls Gemini with a prompt and the uploaded image attachments; the response is parsed as JSON and saved to the database.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Gemini Live API: the voice conversation
&lt;/h3&gt;

&lt;p&gt;The real-time voice experience is powered by the &lt;strong&gt;Gemini Live API&lt;/strong&gt;. My &lt;strong&gt;Go voice agent&lt;/strong&gt; doesn’t do speech-to-text or text-to-speech itself; it acts as a &lt;strong&gt;proxy&lt;/strong&gt; between the browser and Gemini Live:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The customer opens the “Talk” page and connects via &lt;strong&gt;WebSocket&lt;/strong&gt; to the Go service (&lt;code&gt;/live?niobe=&amp;lt;slug&amp;gt;&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;The Go agent loads the waitress and menu from the &lt;strong&gt;same PostgreSQL database&lt;/strong&gt; Laravel uses, builds a system instruction and tool definitions, and opens a &lt;strong&gt;Gemini Live&lt;/strong&gt; session using the Google GenAI SDK.&lt;/li&gt;
&lt;li&gt;Audio flows both ways: browser ↔ Go agent ↔ Gemini Live. The model speaks and listens in real time, with natural turn-taking and interrupt handling.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the model decides to take an action (e.g. “place order”), it sends a &lt;strong&gt;tool call&lt;/strong&gt; to the agent. The Go service runs &lt;strong&gt;LocalNiobeTools&lt;/strong&gt;: it writes to the database (e.g. &lt;code&gt;waitress_action_logs&lt;/code&gt;), can send email or fire webhooks, and returns the result back to Gemini. The model then confirms to the user in voice. So: &lt;strong&gt;Gemini Live&lt;/strong&gt; = voice + reasoning; &lt;strong&gt;Go + PostgreSQL&lt;/strong&gt; = tools and persistence.&lt;/p&gt;

&lt;p&gt;The Go agent connects to Gemini Live with the Google GenAI SDK and wires up system instruction, tools, and audio config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="c"&gt;// agent/live/google.go (simplified)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ClientConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;APIKey&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;      &lt;span class="n"&gt;cfg&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GetAPIKey&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;HTTPOptions&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;httpOpts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="s"&gt;"gemini-2.5-flash-native-audio-preview-12-2025"&lt;/span&gt;

&lt;span class="n"&gt;connectConfig&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;LiveConnectConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ResponseModalities&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Modality&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ModalityAudio&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;SpeechConfig&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SpeechConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;VoiceConfig&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;VoiceConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;PrebuiltVoiceConfig&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;PrebuiltVoiceConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;VoiceName&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Aoede"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;SystemInstruction&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Parts&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewPartFromText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;systemInstruction&lt;/span&gt;&lt;span class="p"&gt;)},&lt;/span&gt;
        &lt;span class="n"&gt;Role&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;RoleUser&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;Tools&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;sess&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Live&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;connectConfig&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From there, a proxy bridges the browser WebSocket and this session so audio and tool calls flow both ways.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Google Cloud?
&lt;/h2&gt;

&lt;p&gt;I run the app on &lt;strong&gt;Google Cloud&lt;/strong&gt; so that the Laravel app and the Go agent can share one &lt;strong&gt;Cloud SQL (PostgreSQL)&lt;/strong&gt; instance. The agent is deployed as a container (e.g. &lt;strong&gt;Cloud Run&lt;/strong&gt;), and the web app is deployed via &lt;strong&gt;Terraform&lt;/strong&gt; and &lt;strong&gt;Cloud Build&lt;/strong&gt; in the &lt;code&gt;deploy/&lt;/code&gt; folder. Same VPC and database mean low latency and a single source of truth for menus, waitresses, and action logs. Configuration is via environment variables—no secrets in code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Agent (Go) – example .env&lt;/span&gt;
&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_key
&lt;span class="nv"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;postgres://user:password@/niobe?host&lt;span class="o"&gt;=&lt;/span&gt;/cloudsql/PROJECT:REGION:INSTANCE
&lt;span class="nv"&gt;PORT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;8080
&lt;span class="nv"&gt;APP_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;https://your-laravel-app.run.app

&lt;span class="c"&gt;# Optional: use Vertex AI instead of Gemini API&lt;/span&gt;
&lt;span class="c"&gt;# GOOGLE_GENAI_USE_VERTEXAI=true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Architecture in one picture
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Browser&lt;/strong&gt; → &lt;strong&gt;Laravel&lt;/strong&gt; (HTTPS) for dashboard and menu upload; Laravel uses &lt;strong&gt;Gemini API&lt;/strong&gt; for menu extraction and writes to &lt;strong&gt;PostgreSQL&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser&lt;/strong&gt; → &lt;strong&gt;Go agent&lt;/strong&gt; (WebSocket) for voice; the agent reads from &lt;strong&gt;PostgreSQL&lt;/strong&gt;, talks to &lt;strong&gt;Gemini Live API&lt;/strong&gt;, and runs tools (DB, email, webhooks) in-process.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So: one database, two Gemini touchpoints (Gemini API for menus, Gemini Live for voice), and Google Cloud to host and connect it all.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I’d do next
&lt;/h2&gt;

&lt;p&gt;I’d add more tool types (e.g. table reservation, kitchen display), support for more languages, and tighter integration with POS systems. The current stack—Laravel + Vue/Inertia for the app, Go for the voice proxy, Gemini for vision and live voice, and Google Cloud for deployment—gives me a clear path to scale.&lt;/p&gt;

&lt;p&gt;If you want to see the code or run it yourself, check out the repo and the &lt;a href="https://github.com/your-org/niobe-project/blob/main/docs/ARCHITECTURE.md" rel="noopener noreferrer"&gt;architecture doc&lt;/a&gt; for diagrams and data flows.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
