Building Argus: A Voice-Driven SOC Copilot with Gemini Live

#geminiliveagentchallenge #cybersecurity #ai #python

When a critical alert flashes at 3:00 AM, SOC analysts usually waste precious minutes manually writing SQL and correlating data across disconnected dashboards. In cybersecurity, this manual approach is too slow.

What if you could just talk to your logs and share screenshots of anomalies?

Argus is a real-time, multimodal SOC AI agent. You can ask it to "show high-severity traffic," or upload a screenshot of a suspicious process, and it instantly queries Google BigQuery, updates device states in Firestore, and pushes live visual updates to a dynamic dashboard—all perfectly synced with its spoken responses.

Try It Out

YouTube Demo: https://youtu.be/5aQJt5LAPxk
Live Web App: https://argus-frontend-215980001921.us-central1.run.app
GitHub Repo: https://github.com/pratima-sapkota/argus

The Tech Stack

Argus relies on a single multiplexed WebSocket connection to stream bidirectional voice and data.

AI: Gemini Live API (gemini-live-2.5-flash-native-audio) via google-genai
Backend: FastAPI, Python 3.13, WebSockets, Pillow (for image processing)
Frontend: React 19, Vite, Tailwind CSS, Web Audio API
Data & State: Google BigQuery (telemetry) and Cloud Firestore (device states)
Hosting: Google Cloud Run, Cloud Build, Artifact Registry

How Gemini and Google Cloud Power Argus

The React frontend captures microphone audio and streams raw PCM16 frames over a WebSocket to a FastAPI backend running on Google Cloud Run. The backend opens a persistent session with the Gemini Live API (gemini-live-2.5-flash-native-audio) via the google-genai SDK, forwarding the audio in real time. When the user asks a question like "show me suspicious traffic on port 443," Gemini responds with a function call—the backend executes that call as a parameterized query against Google BigQuery, returns the results to Gemini for a spoken summary, and simultaneously pushes the structured data back to the frontend over the same WebSocket. Device state changes (blocking or unblocking IPs) are written to Cloud Firestore, and the React dashboard picks them up instantly through Firebase real-time listeners—no polling required. The entire system is deployed automatically via GitHub Actions triggering Google Cloud Build, which builds Docker images, pushes them to Artifact Registry, and deploys both services to Cloud Run.

Challenges

Multimodal Synchronization: Syncing raw PCM16 audio, function call results, and high-res image data over a single WebSocket without dropping frames was tough.
UI State Management: Managing live updates—like grouping and stacking blocked/unblocked device events—required careful React state balancing alongside real-time Firestore listeners.
Deployment: Deploying perfectly coupled frontend and backend services to Cloud Run required a multi-step CI/CD pipeline using Workload Identity Federation.

What I Learned

Native multimodal models change everything. Bypassing STT/TTS services for native audio (plus Vision capabilities) drastically reduces latency and enables natural "barge-in" interactions.
Parallel UI sync is vital. Sending tool execution results to both Gemini and the visual dashboard simultaneously makes the AI feel profoundly responsive.
AI SQL is safe with strict parameters. AI can write dynamic BigQuery filters safely as long as parameterized QueryJobConfig strictly bounds the execution.

Future Directions

Currently a proof-of-concept, Argus has massive potential. Next steps include direct Splunk/Sentinel integration, proactive voice alerting for anomalies, and multi-agent workflows for automated malware reverse-engineering.

This piece of content was created for the purposes of entering this hackathon.
#GeminiLiveAgentChallenge #GoogleCloud #GeminiAI #Cybersecurity #ReactJS #Python

DEV Community