<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Benjamin21</title>
    <description>The latest articles on DEV Community by Benjamin21 (@hislordshipprof).</description>
    <link>https://dev.to/hislordshipprof</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3826312%2Fbc33e51e-b829-4938-9a36-4556c5f52d55.jpeg</url>
      <title>DEV Community: Benjamin21</title>
      <link>https://dev.to/hislordshipprof</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/hislordshipprof"/>
    <language>en</language>
    <item>
      <title>Building OPSLY: Voice + Vision AI Property Management with Gemini Live API, Gemini Vision &amp; Google ADK</title>
      <dc:creator>Benjamin21</dc:creator>
      <pubDate>Mon, 16 Mar 2026 04:59:34 +0000</pubDate>
      <link>https://dev.to/hislordshipprof/building-opsly-voice-vision-ai-property-management-with-gemini-live-api-gemini-vision-google-3d6f</link>
      <guid>https://dev.to/hislordshipprof/building-opsly-voice-vision-ai-property-management-with-gemini-live-api-gemini-vision-google-3d6f</guid>
      <description>&lt;p&gt;Property managers running 50 to 500 units still rely on phone calls, WhatsApp groups, and spreadsheets. When a tenant reports a broken boiler, it takes five phone calls just to get a plumber scheduled. No tracking, no SLA, no visibility.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;OPSLY&lt;/strong&gt; to fix this — a multimodal AI platform combining &lt;strong&gt;voice and vision&lt;/strong&gt; where tenants speak to report issues and upload photos for AI damage assessment, technicians get hands-free job briefings, and managers watch everything update live on one dashboard.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;This project was created for the purposes of entering the Gemini Live Agent Challenge hackathon.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;![OPSLY Architecture]&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fme445hmmqqm3558whvwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fme445hmmqqm3558whvwi.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;NestJS + TypeScript + Prisma ORM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;PostgreSQL (Supabase)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React + TypeScript + Vite + Tailwind CSS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time&lt;/td&gt;
&lt;td&gt;Socket.IO (WebSockets)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI — Voice&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemini Live API&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI — Vision&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Gemini Vision&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI — Agents&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Google ADK&lt;/strong&gt; (6 agents)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deploy&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Google Cloud Run&lt;/strong&gt; (backend) + Vercel (frontend)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How Gemini Powers Every Layer
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Gemini Live API — Voice Interaction
&lt;/h3&gt;

&lt;p&gt;The first core differentiator of OPSLY is that nobody types. Tenants speak naturally to an AI agent to report maintenance issues, and technicians receive hands-free job briefings while on-site. The second is that the AI can &lt;strong&gt;see&lt;/strong&gt; — tenants upload damage photos and Gemini Vision instantly assesses severity, damage type, and confidence.&lt;/p&gt;

&lt;p&gt;I used the &lt;strong&gt;Gemini Live API&lt;/strong&gt; (&lt;code&gt;gemini-2.5-flash-native-audio&lt;/code&gt;) for bidirectional streaming voice sessions. The key features that made this work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-time streaming&lt;/strong&gt; — the tenant speaks and the AI responds conversationally, not in request/response cycles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Barge-in support&lt;/strong&gt; — tenants can interrupt the agent mid-sentence, and it handles it naturally. This is critical for a real conversation feel&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool calling during voice&lt;/strong&gt; — while talking, the agent calls backend tools to create work orders, look up unit details, and check existing issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's how the flow works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tenant speaks → Gemini Live API processes audio
  → Agent identifies intent (new issue report)
  → Agent asks clarifying questions (location, severity)
  → Agent calls createWorkOrder tool
  → Backend creates work order + emits WebSocket event
  → Manager's dashboard updates in real-time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same voice architecture powers the &lt;strong&gt;technician briefing&lt;/strong&gt; — a technician says "brief me on my next job" and the AI reads out the address, issue description, severity, and tenant notes. Completely hands-free for someone holding tools or driving.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Google ADK — Multi-Agent Orchestration
&lt;/h3&gt;

&lt;p&gt;A single AI agent trying to handle everything — triage, status lookups, scheduling, escalations, analytics — would be unreliable. I used &lt;strong&gt;Google ADK&lt;/strong&gt; to build a multi-agent system with a central router:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌───────────────────┐
│  OpslyRouterAgent │  ← Classifies intent, routes to specialist
└─────────┬─────────┘
          │
    ┌─────┴──────────────────────────────────────┐
    │           │            │          │         │
    ▼           ▼            ▼          ▼         ▼
┌────────┐ ┌────────┐ ┌──────────┐ ┌───────┐ ┌─────────┐
│ Triage │ │ Status │ │ Schedule │ │Escal. │ │Analytics│
│ Agent  │ │ Agent  │ │  Agent   │ │ Agent │ │  Agent  │
└────────┘ └────────┘ └──────────┘ └───────┘ └─────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TriageAgent&lt;/strong&gt; — Collects issue details from tenants, requests photos, creates work orders&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;StatusAgent&lt;/strong&gt; — Answers questions about work order status, assignments, and ETAs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ScheduleAgent&lt;/strong&gt; — Helps technicians manage their job queue and update statuses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EscalationAgent&lt;/strong&gt; — Handles SLA breaches and emergency escalations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AnalyticsAgent&lt;/strong&gt; — Provides operational metrics for managers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each agent has a focused system prompt and its own set of backend tools. The router agent classifies the user's intent and delegates to the right specialist. This keeps each agent simple and reliable — no hallucination about capabilities it doesn't have.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Critically, agents never write to the database directly.&lt;/strong&gt; Every action goes through authenticated REST endpoints. This means the same validation, RBAC guards, and audit trail that protect the API also protect agent actions.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Gemini Vision — Photo Damage Assessment
&lt;/h3&gt;

&lt;p&gt;When a tenant reports an issue, the AI asks them to upload a photo. That photo goes through &lt;strong&gt;Gemini Vision&lt;/strong&gt; (&lt;code&gt;gemini-2.5-flash&lt;/code&gt;) for automated damage assessment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"damageType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"water_leak"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HIGH"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.92&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Active water leak from ceiling with visible water damage and staining on drywall"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structured assessment automatically sets the work order priority. A high-severity water leak becomes URGENT and gets an aggressive SLA deadline. A cosmetic scratch stays LOW priority. The manager sees this assessment alongside the tenant's photo in the dashboard — no manual triage needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Google Cloud Run — Production Deployment
&lt;/h3&gt;

&lt;p&gt;The NestJS backend is containerized with a multi-stage Docker build (Node 20 Alpine) and deployed to &lt;strong&gt;Google Cloud Run&lt;/strong&gt;. This gives us:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-scaling (including scale-to-zero for cost efficiency)&lt;/li&gt;
&lt;li&gt;HTTPS out of the box&lt;/li&gt;
&lt;li&gt;WebSocket support for real-time dashboard updates&lt;/li&gt;
&lt;li&gt;Easy environment variable management for secrets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The deployment is automated through GitHub Actions — push to &lt;code&gt;main&lt;/code&gt; triggers a build, pushes to Artifact Registry, and deploys to Cloud Run automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real-Time Architecture
&lt;/h2&gt;

&lt;p&gt;What makes OPSLY feel alive is that every state change propagates instantly across all three user roles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tenant reports&lt;/strong&gt; via voice → work order appears on manager's dashboard (WebSocket push)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manager assigns&lt;/strong&gt; technician → tenant gets notification, technician gets job in queue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technician sends ETA&lt;/strong&gt; → tenant sees countdown timer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technician completes job&lt;/strong&gt; → manager's KPIs update, tenant sees resolution&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All of this happens through Socket.IO rooms filtered by role. Managers see everything. Tenants only see their own work orders. Technicians only see their assigned jobs.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Gemini Live API's barge-in handling is genuinely impressive.&lt;/strong&gt; In earlier prototypes with non-streaming approaches, interrupting the AI felt jarring. With Gemini Live, the agent naturally stops, acknowledges the interruption, and continues the conversation. This is what makes it feel like talking to a person rather than a bot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-agent orchestration with ADK is cleaner than monolithic agents.&lt;/strong&gt; Each specialist agent has a small, focused prompt and a limited set of tools. The router agent's job is simple: classify intent and delegate. This separation made debugging much easier — when the schedule agent gave a wrong response, I only had to fix one prompt, not untangle a 2000-line system prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini Vision for structured assessment is a time-saver.&lt;/strong&gt; Instead of building a custom damage classification model, Gemini Vision returns structured JSON with damage type, severity, and confidence. For a property management context, this is accurate enough to automate triage, and the confidence score lets managers know when to double-check.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://frontend-theta-dusky-58.vercel.app" rel="noopener noreferrer"&gt;https://frontend-theta-dusky-58.vercel.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/hislordshipprof/opsly" rel="noopener noreferrer"&gt;https://github.com/hislordshipprof/opsly&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Demo accounts (all use password &lt;code&gt;password123&lt;/code&gt;):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tenant: &lt;code&gt;tenant@opsly.io&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Manager: &lt;code&gt;sarah@opsly.io&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Technician: &lt;code&gt;mike@opsly.io&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;OPSLY was built solo using Claude Code for the Gemini Live Agent Challenge hackathon. The entire platform — backend, frontend, AI agents, voice + vision integration, and deployment — was built in under two weeks.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;#GeminiLiveAgentChallenge&lt;/em&gt;&lt;/p&gt;

</description>
      <category>google</category>
      <category>gemini</category>
      <category>ai</category>
      <category>hackathon</category>
    </item>
  </channel>
</rss>
