<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Pratima Sapkota</title>
    <description>The latest articles on DEV Community by Pratima Sapkota (@pratima-sapkota).</description>
    <link>https://dev.to/pratima-sapkota</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3825401%2Fb4b554ff-1684-492c-b3ad-c2267ed82563.jpg</url>
      <title>DEV Community: Pratima Sapkota</title>
      <link>https://dev.to/pratima-sapkota</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pratima-sapkota"/>
    <language>en</language>
    <item>
      <title>Building Argus: A Voice-Driven SOC Copilot with Gemini Live</title>
      <dc:creator>Pratima Sapkota</dc:creator>
      <pubDate>Mon, 16 Mar 2026 23:09:14 +0000</pubDate>
      <link>https://dev.to/pratima-sapkota/building-argus-a-voice-driven-soc-copilot-with-gemini-live-2np6</link>
      <guid>https://dev.to/pratima-sapkota/building-argus-a-voice-driven-soc-copilot-with-gemini-live-2np6</guid>
      <description>&lt;p&gt;When a critical alert flashes at 3:00 AM, SOC analysts usually waste precious minutes manually writing SQL and correlating data across disconnected dashboards. In cybersecurity, this manual approach is too slow.&lt;/p&gt;

&lt;p&gt;What if you could just &lt;em&gt;talk&lt;/em&gt; to your logs and share screenshots of anomalies? &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Argus&lt;/strong&gt; is a real-time, multimodal SOC AI agent. You can ask it to "show high-severity traffic," or upload a screenshot of a suspicious process, and it instantly queries Google BigQuery, updates device states in Firestore, and pushes live visual updates to a dynamic dashboard—all perfectly synced with its spoken responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Try It Out
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;YouTube Demo:&lt;/strong&gt; &lt;a href="https://youtu.be/5aQJt5LAPxk" rel="noopener noreferrer"&gt;https://youtu.be/5aQJt5LAPxk&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live Web App:&lt;/strong&gt; &lt;a href="https://argus-frontend-215980001921.us-central1.run.app" rel="noopener noreferrer"&gt;https://argus-frontend-215980001921.us-central1.run.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Repo:&lt;/strong&gt; &lt;a href="https://github.com/pratima-sapkota/argus" rel="noopener noreferrer"&gt;https://github.com/pratima-sapkota/argus&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Tech Stack
&lt;/h3&gt;

&lt;p&gt;Argus relies on a single multiplexed WebSocket connection to stream bidirectional voice and data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI:&lt;/strong&gt; Gemini Live API (&lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;) via &lt;code&gt;google-genai&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; FastAPI, Python 3.13, WebSockets, Pillow (for image processing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; React 19, Vite, Tailwind CSS, Web Audio API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data &amp;amp; State:&lt;/strong&gt; Google BigQuery (telemetry) and Cloud Firestore (device states)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hosting:&lt;/strong&gt; Google Cloud Run, Cloud Build, Artifact Registry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgdnp5pbhmojk6dqke7n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frgdnp5pbhmojk6dqke7n.png" alt="Argus system architecture diagram"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How Gemini and Google Cloud Power Argus
&lt;/h3&gt;

&lt;p&gt;The React frontend captures microphone audio and streams raw PCM16 frames over a WebSocket to a FastAPI backend running on &lt;strong&gt;Google Cloud Run&lt;/strong&gt;. The backend opens a persistent session with the &lt;strong&gt;Gemini Live API&lt;/strong&gt; (&lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;) via the &lt;code&gt;google-genai&lt;/code&gt; SDK, forwarding the audio in real time. When the user asks a question like "show me suspicious traffic on port 443," Gemini responds with a function call—the backend executes that call as a parameterized query against &lt;strong&gt;Google BigQuery&lt;/strong&gt;, returns the results to Gemini for a spoken summary, and simultaneously pushes the structured data back to the frontend over the same WebSocket. Device state changes (blocking or unblocking IPs) are written to &lt;strong&gt;Cloud Firestore&lt;/strong&gt;, and the React dashboard picks them up instantly through Firebase real-time listeners—no polling required. The entire system is deployed automatically via &lt;strong&gt;GitHub Actions&lt;/strong&gt; triggering &lt;strong&gt;Google Cloud Build&lt;/strong&gt;, which builds Docker images, pushes them to &lt;strong&gt;Artifact Registry&lt;/strong&gt;, and deploys both services to Cloud Run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Challenges
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal Synchronization:&lt;/strong&gt; Syncing raw PCM16 audio, function call results, and high-res image data over a single WebSocket without dropping frames was tough.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI State Management:&lt;/strong&gt; Managing live updates—like grouping and stacking blocked/unblocked device events—required careful React state balancing alongside real-time Firestore listeners.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment:&lt;/strong&gt; Deploying perfectly coupled frontend and backend services to Cloud Run required a multi-step CI/CD pipeline using Workload Identity Federation.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What I Learned
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Native multimodal models change everything.&lt;/strong&gt; Bypassing STT/TTS services for native audio (plus Vision capabilities) drastically reduces latency and enables natural "barge-in" interactions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel UI sync is vital.&lt;/strong&gt; Sending tool execution results to both Gemini and the visual dashboard simultaneously makes the AI feel profoundly responsive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI SQL is safe with strict parameters.&lt;/strong&gt; AI can write dynamic BigQuery filters safely as long as parameterized &lt;code&gt;QueryJobConfig&lt;/code&gt; strictly bounds the execution.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Future Directions
&lt;/h3&gt;

&lt;p&gt;Currently a proof-of-concept, Argus has massive potential. Next steps include direct Splunk/Sentinel integration, proactive voice alerting for anomalies, and multi-agent workflows for automated malware reverse-engineering.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This piece of content was created for the purposes of entering this hackathon.&lt;/em&gt;&lt;br&gt;
&lt;code&gt;#GeminiLiveAgentChallenge&lt;/code&gt; &lt;code&gt;#GoogleCloud&lt;/code&gt; &lt;code&gt;#GeminiAI&lt;/code&gt; &lt;code&gt;#Cybersecurity&lt;/code&gt; &lt;code&gt;#ReactJS&lt;/code&gt; &lt;code&gt;#Python&lt;/code&gt;&lt;/p&gt;

</description>
      <category>geminiliveagentchallenge</category>
      <category>cybersecurity</category>
      <category>ai</category>
      <category>python</category>
    </item>
  </channel>
</rss>
