Traditional security camera stacks built with OpenCV and Flask often break down under real-world load. When video processing and streaming all run inside a blocking while True loop, a single slow operation can stall the entire pipeline—freezing the live feed for every connected viewer.
In this tutorial, we will build a responsive, scalable security dashboard that avoids these limitations by separating video delivery from detection logic. We will use:
- Stream Vision Agents: For global, low-latency video infrastructure.
- YOLO11: For human pose estimation and object detection.
- React: For a real-time monitoring and alerting dashboard.
This architecture ensures that the video feed remains smooth for the end users.
What We're Building
By the end of this tutorial, you will have a security monitoring dashboard with the following capabilities:
- Live Video Panel: A real-time WebRTC stream with pose-skeleton overlays generated by YOLO
- Event Log Panel: Timestamped intrusion alerts that update instantly as events occur
- Smart Detection: Automatic alerts triggered when a person enters the monitored area
- Visual Feedback: Immediate UI cues, such as a red video border, when a threat is detecte
Technical Prerequisites
- Stream API credentials
- LLM API Key (Gemini, OpenAI or Anthropic)
- Python 3.11+
- Node.js 18+ (for React tooling)
Setup & Installation
Backend Setup (Python)
We use uv, a modern, high-speed Python package manager, to handle dependencies. Its significantly faster resolution times and built-in environment handling make it ideal for agent-based workflows.
# 1. Create the project folders
mkdir security-agent
cd security-agent
# 2. Install uv (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh
# 3. Initialise the project and install dependencies
uv init
uv add "vision-agents[getstream, gemini]" vision-agents-plugins-ultralytics ultralytics
Add the necessary keys to an .env file
STREAM_API_KEY=your_stream_key_here
STREAM_API_SECRET=your_stream_secret_here
GEMINI_API_KEY=your_llm_key_here
Models auto-download on first run, so you can skip this step. However, running this now ensures your agent starts instantly without waiting for downloads:
uv run python -c "from ultralytics import YOLO; YOLO('yolo11n.pt'); YOLO('yolo11n-pose.pt')"
Frontend Setup (React)
Now, let's create our React application with Vite and install the needed dependencies:
npm create vite@latest security-frontend -- --template react-ts
cd security-frontend
npm install @stream-io/video-react-sdk
System Architecture
The system is split into three distinct layers:
-
Frontend (React): Handles the WebRTC connection and listens for WebSocket events (
intruder_alert). - Edge (Stream): Ingests the camera feed and routes frames to the backend via a low-latency protocol.
- Backend (Python): Runs the Vision Agent.
The Optimisation Strategy
We use two YOLO models working together to balance visual feedback with performance:
-
YOLO11 Pose (
yolo11n-pose.pt): Responsible for rendering pose skeletons for visual confirmation. This runs on every frame and prioritises clarity over speed. -
YOLO11 Detection (
yolo11n.pt): Optimised for fast person detection, used only for logical decisions and alerting.
The Filtering Logic
When YOLO processes a frame, it detects everything—including chairs, cups, plants, and people. We filter out this noise by checking whether class_id == 0, since YOLO models are trained on the COCO (Common Objects in Context) dataset, which assigns a numeric class ID to each detectable object.
This approach transforms a generic object detector into a specialised security guard that ignores your cat running past but triggers instantly for humans
# YOLO returns ALL detected objects
results = model(frame)
person_count = 0
for box in results[0].boxes:
# Check the class ID
class_id = int(box.cls)
if class_id == 0: # 0 = Person
person_count += 1
# This is our target - count it, draw it, alert!
else:
# Ignore chairs, tables, pets, etc.
pass
if person_count > 0:
send_alert("Intruder detected!")
Building the Security Agent (Backend)
Project Structure
The backend is organised around a single Python agent responsible for video processing, detection logic, and event emission.
security-agent/
├── .venv/
├── .env # API keys and secrets
├── generate_token.py # Development utility for creating access tokens
├── main.py # Vision agent entry point
├── yolo11n.pose.pt # YOLO pose model (excluded from version control)
└── yolo11n.pt # YOLO detection model (excluded from version control)
Generating Tokens for Development
Since we don’t have a full authentication backend for this demo, we need to generate a Stream access token manually so the frontend can connect to the video stream.
Choosing a User ID
You need to decide on a user_id for your admin viewer.
- Existing Users: If you already have a user in your Stream dashboard, reuse that ID.
-
New Users: If not, simply pick a name (e.g.,
admin_user). Stream will automatically create this user the first time they connect with a valid token.
Create a file named get_token.py in your backend directory. This script uses your server-side API secret to sign a token for the chosen user.
Security Note: Tokens must always be generated server-side using your Stream API secret. In this demo, we manually generate a temporary development token. In a production application, this token would be issued by your authentication or login endpoint.
import os
from getstream import Stream
# 1. Get keys from the environment (loaded by uv)
key = os.getenv("STREAM_API_KEY")
secret = os.getenv("STREAM_API_SECRET")
if not key or not secret:
print("❌ Error: Could not find API keys. Make sure you run with --env-file .env")
exit(1)
# 2. Initialize Stream
client = Stream(key, secret)
# 3. Generate Token for Shaq12
user_id = "Shaq12"
token = client.create_token(user_id)
print(f"\n✅ TOKEN FOR {user_id}:")
print(token)
print("\nCopy the string above and paste it into App.tsx")
Run the script using:
uv run --env-file .env generate_token.py
The generated token will be used later in the frontend configuration.
The Foundation (Imports & Config)
First, we need to import our tools and load the API keys.
import asyncio
import os
import time
import logging
import sys
from vision_agents.core import agents, User
import vision_agents.plugins.getstream as getstream_plugin
from getstream import AsyncStream
from vision_agents.plugins.ultralytics import YOLOPoseProcessor
from ultralytics import YOLO
import vision_agents.plugins.gemini as gemini
# Setup logging to see what's happening in the terminal
logging.basicConfig(level=logging.INFO, format='%(message)s', stream=sys.stdout)
# Global Config
ACTIVE_CALL_ID = "security-demo-working"
STREAM_CLIENT = None # We will fill this later
The Custom Processor
To make the agent intelligent, we override the default video processor and introduce a custom SecurityYOLOProcessor. Running object detection on every frame is computationally expensive and can quickly introduce lag. Instead of processing all 30 frames per second, we split the workload into two distinct paths: visuals and logic.
For visuals, we call the parent YOLOPoseProcessor on every frame to render the skeleton overlay at 30 FPS, ensuring smooth and responsive video output.
For decision-making, we run the detection model only every fifth frame using a simple modulo check (frame_count % 5 == 0). This means the agent “thinks” about intruder detection just six times per second—dramatically reducing CPU usage while remaining fast enough to react to humans in the scene.
class SecurityYOLOProcessor(YOLOPoseProcessor):
def __init__(self, model="yolo11n-pose.pt", confidence=0.5, cooldown=5.0):
super().__init__(model=model, confidence=confidence)
self.cooldown = cooldown
self.last_alert_time = 0
self.frame_count = 0
print("📦 Loading detection model...", flush=True)
# Load the secondary model for counting
self.detect_model = YOLO("yolo11n.pt")
async def _process_pose_async(self, frame_data):
# 1. Let the parent draw the skeleton (Visuals)
annotated = await super()._process_pose_async(frame_data)
# 2. Run our Counter Logic (Optimization)
self.frame_count += 1
if self.frame_count % 5 == 0: # Check every 5th frame
await self._run_detection(frame_data)
return annotated
async def _run_detection(self, frame_data):
# ... (Detection logic to count people) ...
# If person_count > 0, call self._send_alert()
pass
The Agent Assembly
Next, we wire the video processing and reasoning layers together using the agents.Agent class. This wrapper acts as the bridge between two core components:
- The edge layer, which manages the real-time video connection using your Stream credentials.
- The LLM layer (e.g., Gemini) provides the core reasoning and runtime dependency for Vision Agents. Even if you don’t use language or reasoning capabilities in this dashboard, an LLM must still be configured so the agent runtime initialises properly and can orchestrate video processing and events.
We then inject our SecurityYOLOProcessor into the processors list. This tells the agent to apply our custom video-processing logic before publishing the video stream.
async def main():
global STREAM_CLIENT
# 1. Authenticate
api_key = os.getenv("STREAM_API_KEY")
api_secret = os.getenv("STREAM_API_SECRET")
client = AsyncStream(api_key=api_key, api_secret=api_secret)
STREAM_CLIENT = client
# 2. Join the Call
call = client.video.call("default", ACTIVE_CALL_ID)
await call.get_or_create(data={"created_by": {"id": "security-bot"}})
# 3. Assemble the Agent
agent = agents.Agent(
edge=getstream_plugin.Edge(api_key=api_key, api_secret=api_secret),
llm=gemini.LLM(model="gemini-1.5-flash", api_key=os.getenv("GEMINI_API_KEY")),
agent_user=User(id="security_agent", name="Security Guard"),
processors=[SecurityYOLOProcessor()] # <--- Inject our custom processor
)
await agent.join(call)
# Keep alive
try:
while True: await asyncio.sleep(1)
except KeyboardInterrupt: pass
if __name__ == "__main__":
asyncio.run(main())
Complete Backend Code (main.py)
Now that we’ve defined the custom processor and the agent configuration, we can combine them into the final executable script. Create a file named main.py and paste in the code below.
import asyncio
import os
import time
import logging
import sys
from vision_agents.core import agents, User
import vision_agents.plugins.getstream as getstream_plugin
from getstream import AsyncStream
from vision_agents.plugins.ultralytics import YOLOPoseProcessor
from ultralytics import YOLO
import vision_agents.plugins.gemini as gemini
logging.basicConfig(level=logging.INFO, format='%(message)s', stream=sys.stdout)
logging.getLogger("vision_agents").setLevel(logging.CRITICAL)
print("\n🚀 SECURITY DASHBOARD - ARGUMENT ORDER FIX\n", flush=True)
# --- GLOBAL VARIABLES ---
ACTIVE_CALL_ID = "security-demo-working"
STREAM_CLIENT = None
class SecurityYOLOProcessor(YOLOPoseProcessor):
def __init__(self, model="yolo11n-pose.pt", confidence=0.5, cooldown=5.0):
super().__init__(model=model, confidence=confidence)
self.cooldown = cooldown
self.last_alert_time = 0
self.frame_count = 0
print("📦 Loading detection model...", flush=True)
self.detect_model = YOLO("yolo11n.pt")
print("✅ Processor Ready", flush=True)
async def _process_pose_async(self, frame_data):
# 1. Parent Logic
annotated = await super()._process_pose_async(frame_data)
self.frame_count += 1
if self.frame_count % 30 == 0:
print(f"💓 Frame {self.frame_count}", flush=True)
# 2. Detection Logic
try:
if self.frame_count % 5 == 0:
results = self.detect_model(frame_data, verbose=False)
person_count = 0
for box in results[0].boxes:
if int(box.cls) == 0:
person_count += 1
if person_count > 0:
await self._send_alert(person_count)
except Exception as e:
pass
return annotated
async def _send_alert(self, person_count):
now = time.time()
if (now - self.last_alert_time) > self.cooldown:
print(f"\n🚨 {person_count} PERSON(S) DETECTED! Sending Alert...", flush=True)
try:
if STREAM_CLIENT:
# 🛠️ FIX: SIMPLE FLAT PAYLOAD
# We don't try to nest "custom" inside "custom".
# We just send the data directly.
payload = {
"alert_trigger": "intrusion", # <--- The Key we will look for
"message": f"{person_count} Intruder(s) Detected",
"person_count": person_count,
"timestamp": now
}
await STREAM_CLIENT.video.send_call_event(
"default", # Call Type
ACTIVE_CALL_ID, # Call ID
"security_agent", # User ID
payload # The Data
)
print(f"✅ ALERT SENT!", flush=True)
else:
print(f"❌ CRITICAL: Global CLIENT variable is missing!", flush=True)
self.last_alert_time = now
except Exception as e:
print(f"❌ Alert Failed: {e}", flush=True)
async def main():
global STREAM_CLIENT
api_key = os.getenv("STREAM_API_KEY")
api_secret = os.getenv("STREAM_API_SECRET")
gemini_key = os.getenv("GEMINI_API_KEY")
if not api_key: raise ValueError("❌ Check .env")
client = AsyncStream(api_key=api_key, api_secret=api_secret)
STREAM_CLIENT = client
call = client.video.call("default", ACTIVE_CALL_ID)
await call.get_or_create(data={"created_by": {"id": "security-bot"}})
print(f"📹 Call ID: {ACTIVE_CALL_ID}", flush=True)
processor = SecurityYOLOProcessor(
model="yolo11n-pose.pt",
confidence=0.5,
cooldown=5.0
)
edge = getstream_plugin.Edge(api_key=api_key, api_secret=api_secret)
llm = gemini.LLM(model="gemini-1.5-flash", api_key=gemini_key)
agent = agents.Agent(
edge=edge,
llm=llm,
agent_user=User(id="security_agent", name="Security Guard"),
processors=[processor]
)
print("🤖 Agent joining...", flush=True)
await agent.join(call)
print("✅ AGENT LIVE!", flush=True)
try:
while True: await asyncio.sleep(1)
except KeyboardInterrupt: pass
if __name__ == "__main__":
asyncio.run(main())
The Frontend: Building the Dashboard UI
With the backend successfully broadcasting events, we need a dashboard to visualise them. Our React application acts as the command centre, combining a live video feed with real-time threat logs.
Our dashboard is built on three pillars:
- Video Panel: Renders the WebRTC stream, overlaying the YOLO detection boxes from the backend.
-
Event Listener: A background process that subscribes to the
customWebSocket events sent by our Python backend. - Event Log Panel: A chronological list of security alerts.
Environment Setup
Instead of hardcoding our secrets, we will store them safely. Create a file named .env in the root of your frontend folder:
# .env
VITE_STREAM_API_KEY=api_key
# Paste the token you generated in the get_token.py script
VITE_STREAM_TOKEN=token_generated
Project Structure
Here’s how the frontend project is organised. The dashboard logic lives in a single React entry point for simplicity, while sensitive credentials are stored securely in an environment file.
security-dashboard/
├── src/
│ ├── App.tsx # Dashboard UI and logic
│ └── main.tsx
└── .env # API keys and access token
Imports & Setup
We import the required Stream Video components along with the necessary React hooks. The API credentials—VITE_STREAM_API_KEY and VITE_STREAM_TOKEN—are loaded from environment variables, allowing the application to authenticate with Stream securely.
import { useEffect, useState } from 'react';
import {
StreamVideo,
StreamVideoClient,
StreamCall,
StreamTheme,
useCallStateHooks,
CallControls,
SpeakerLayout,
useCall,
} from '@stream-io/video-react-sdk';
import '@stream-io/video-react-sdk/dist/css/styles.css';
const streamKey = import.meta.env.VITE_STREAM_API_KEY;
const streamToken = import.meta.env.VITE_STREAM_TOKEN;
Configuration
We configure the video call using fixed identifiers for demonstration purposes:
-
callId: 'security-demo-working'— uniquely identifies the video call session. -
userId: 'Shaq12'— represents the current user joining the call.
In a production environment, these values would typically be generated dynamically based on the authenticated user and session context.
const apiKey = streamKey;
const callId = 'security-demo-working';
const userId = 'Shaq12';
const token = streamToken;
const user = { id: userId, name: 'Admin Viewer' };
Client & Call Initialisation
Next, we initialise the Stream Video client using the API key and token generated earlier. This establishes a secure connection to the Stream servers and enables real-time video and event delivery.
const videoClient = new StreamVideoClient({
apiKey,
user,
token,
options: { logLevel: 'warn' }
});
const call = videoClient.call('default', callId);
call.join({ create: true });
The Event Listener (Alert Detection)
This is the bridge between the Python backend and the React frontend. We use the useCall() hook to access the active call instance, which also serves as our WebSocket transport.
The backend sends structured JSON payloads as custom events, and the frontend subscribes specifically to this channel.
When an event arrives, we inspect the alert_trigger field to confirm whether an intrusion has been detected.
useEffect(() => {
if (!currentCall) return;
const handleEvent = (event: any) => {
const data = event.custom || {};
if (data.alert_trigger === 'intrusion') {
console.log("🚨 INTRUDER CONFIRMED:", data);
const timestamp = new Date().toLocaleTimeString();
const message = data.message || "Intruder Detected";
const count = data.person_count || 1;
const logEntry = `⚠️ ${timestamp}: ${message}`;
setLogs((prev) => [logEntry, ...prev.slice(0, 9)]);
}
};
currentCall.on('custom', handleEvent);
return () => {
currentCall.off('custom', handleEvent);
};
}, [currentCall]);
Data Flow
The data flow through the frontend follows a simple, predictable pipeline:
Header Section
The header provides immediate system visibility, displaying the dashboard title and the current agent status.
<div style={{ padding: '20px', borderBottom: '1px solid #333', display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
<h2 style={{ margin: 0 }}>🛡️ Security Control Center</h2>
<div style={{ fontSize: '14px', color: '#aaa' }}>
System Status: <span style={{ color: agent ? '#0f0' : '#f00', fontWeight: 'bold' }}>
{agent ? 'ONLINE' : 'OFFLINE'}
</span>
</div>
</div>
Video Feed Panel
The video panel renders the live WebRTC stream and provides immediate visual feedback when a security event occurs.
The container’s border colour is driven directly by application state. When the logs array contains one or more alerts, the border turns red, instantly signalling that attention is required. When no alerts are present, the border returns to its neutral state.
This approach ensures that visual feedback is both reactive and consistent, without requiring additional UI state or side effects.
<div style={{ flex: 3, padding: '20px', display: 'flex', flexDirection: 'column', gap: '10px' }}>
<div style={{
flex: 1,
borderRadius: '15px',
overflow: 'hidden',
border: logs.length > 0 ? '2px solid red' : '1px solid #333',
transition: 'border 0.3s ease'
}}>
<SpeakerLayout participantsBarPosition="bottom" />
</div>
<div style={{ display: 'flex', justifyContent: 'center' }}>
<CallControls />
</div>
</div>
Event Log Panel
The Event Log serves as the system’s source of truth for all detected security events. It records each alert in a clear, chronological format, allowing operators to review activity over time and quickly understand what triggered a visual warning.
Each log entry includes:
- A timestamp indicating when the event occurred
- A short descriptive message provided by the backend
- Visual emphasis to distinguish alerts from normal system activity
To keep the interface readable during high-activity periods, the log is managed as a First-In, First-Out (FIFO) queue. New alerts are added to the top of the list, while older entries are truncated once the maximum display limit is reached. This ensures that the most relevant information is always visible without overwhelming the user.
In addition to alerts, the panel also displays lightweight system metadata—such as the number of active viewers and the connected agent ID—providing useful operational context at a glance.
When no alerts are present, the UI enters a passive monitoring state, clearly indicating that the system is active and scanning for intrusions.
<div style={{ flex: 1, background: '#1a1a1a', padding: '20px', borderLeft: '1px solid #333', overflowY: 'auto' }}>
<h3 style={{ marginTop: 0, borderBottom: '1px solid #444', paddingBottom: '10px' }}>🚨 Event Log</h3>
<div style={{ fontSize: '12px', color: '#666', marginBottom: '15px', padding: '10px', background: '#111', borderRadius: '5px' }}>
<div>Active Viewers: {participants.length}</div>
<div>Bot ID: {agent ? agent.userId : 'Searching...'}</div>
</div>
{logs.length === 0 ? (
<div style={{ color: '#666', fontStyle: 'italic', marginTop: 20 }}>
Waiting for alerts...<br/>
<small>System is scanning for intruders.</small>
</div>
) : (
<ul style={{ listStyle: 'none', padding: 0 }}>
{logs.map((log, i) => (
<li key={i} style={{
padding: '10px', marginBottom: '8px',
background: 'rgba(255, 0, 0, 0.15)',
borderLeft: '3px solid #ff4444',
color: '#ffcccc',
borderRadius: '0 4px 4px 0',
animation: 'fadeIn 0.3s ease-in'
}}>
{log}
</li>
))}
</ul>
)}
</div>
The Complete Source Code(App.tsx)
We have examined the core logic blocks individually. Now, let’s assemble them into the final implementation file.
Copy the code below into App.tsx.
import { useEffect, useState } from 'react';
import {
StreamVideo,
StreamVideoClient,
StreamCall,
StreamTheme,
useCallStateHooks,
CallControls,
SpeakerLayout,
useCall, // <--- IMPORTANT: We need this hook to listen to events
} from '@stream-io/video-react-sdk';
import '@stream-io/video-react-sdk/dist/css/styles.css';
const streamKey = import.meta.env.VITE_STREAM_API_KEY;
const streamToken = import.meta.env.VITE_STREAM_TOKEN;
// --- CONFIGURATION ---
const apiKey = streamKey;
const callId = 'security-demo-working'; // MUST MATCH BACKEND
const userId = 'Shaq12';
const token = streamToken;
const user = { id: userId, name: 'Admin Viewer' };
// --- CLIENT SETUP ---
const videoClient = new StreamVideoClient({
apiKey,
user,
token,
options: { logLevel: 'warn' }
});
const call = videoClient.call('default', callId);
call.join({ create: true });
// --- MAIN LAYOUT COMPONENT ---
const SecurityLayout = () => {
const { useParticipants } = useCallStateHooks();
const participants = useParticipants();
const [logs, setLogs] = useState<string[]>([]);
// 1. GET THE ACTIVE CALL OBJECT
const currentCall = useCall();
// 2. LISTEN FOR ALERTS (Simplified)
useEffect(() => {
if (!currentCall) return;
const handleEvent = (event: any) => {
// DEBUG: Log everything so we can see what arrives
// console.log("📨 Event Received:", event);
// 1. Grab the custom data packet
// Stream puts our payload inside 'custom'
const data = event.custom || {};
// 2. Check for our specific trigger key
if (data.alert_trigger === 'intrusion') {
console.log("🚨 INTRUDER CONFIRMED:", data);
const timestamp = new Date().toLocaleTimeString();
const message = data.message || "Intruder Detected";
const count = data.person_count || 1;
const logEntry = `⚠️ ${timestamp}: ${message}`;
setLogs((prev) => [logEntry, ...prev.slice(0, 9)]);
}
};
currentCall.on('custom', handleEvent);
return () => {
currentCall.off('custom', handleEvent);
};
}, [currentCall]);
// Check if the Security Bot is in the room
const agent = participants.find((p) =>
p.userId.includes('security') || p.userId.includes('agent')
);
return (
<div style={{ width: '100vw', height: '100vh', background: '#0e0e0e', color: 'white', fontFamily: 'Arial, sans-serif' }}>
{/* HEADER */}
<div style={{ padding: '20px', borderBottom: '1px solid #333', display: 'flex', justifyContent: 'space-between', alignItems: 'center' }}>
<h2 style={{ margin: 0 }}>🛡️ Security Control Center</h2>
<div style={{ fontSize: '14px', color: '#aaa' }}>
System Status: <span style={{ color: agent ? '#0f0' : '#f00', fontWeight: 'bold' }}>
{agent ? 'ONLINE' : 'OFFLINE'}
</span>
</div>
</div>
<div style={{ display: 'flex', height: 'calc(100vh - 80px)' }}>
{/* VIDEO FEED (Left Panel) */}
<div style={{ flex: 3, padding: '20px', display: 'flex', flexDirection: 'column', gap: '10px' }}>
<div style={{
flex: 1,
borderRadius: '15px',
overflow: 'hidden',
border: logs.length > 0 ? '2px solid red' : '1px solid #333',
transition: 'border 0.3s ease'
}}>
<SpeakerLayout participantsBarPosition="bottom" />
</div>
<div style={{ display: 'flex', justifyContent: 'center' }}>
<CallControls />
</div>
</div>
{/* LOG PANEL (Right Panel) */}
<div style={{ flex: 1, background: '#1a1a1a', padding: '20px', borderLeft: '1px solid #333', overflowY: 'auto' }}>
<h3 style={{ marginTop: 0, borderBottom: '1px solid #444', paddingBottom: '10px' }}>🚨 Event Log</h3>
<div style={{ fontSize: '12px', color: '#666', marginBottom: '15px', padding: '10px', background: '#111', borderRadius: '5px' }}>
<div>Active Viewers: {participants.length}</div>
<div>Bot ID: {agent ? agent.userId : 'Searching...'}</div>
</div>
{logs.length === 0 ? (
<div style={{ color: '#666', fontStyle: 'italic', marginTop: 20 }}>
Waiting for alerts...<br/>
<small>System is scanning for intruders.</small>
</div>
) : (
<ul style={{ listStyle: 'none', padding: 0 }}>
{logs.map((log, i) => (
<li key={i} style={{
padding: '10px', marginBottom: '8px',
background: 'rgba(255, 0, 0, 0.15)',
borderLeft: '3px solid #ff4444',
color: '#ffcccc',
borderRadius: '0 4px 4px 0',
animation: 'fadeIn 0.3s ease-in'
}}>
{log}
</li>
))}
</ul>
)}
</div>
</div>
</div>
);
};
export default function App() {
return (
<StreamVideo client={videoClient}>
<StreamTheme>
<StreamCall call={call}>
<SecurityLayout />
</StreamCall>
</StreamTheme>
</StreamVideo>
);
}
Running the Application
Now that the code is built, let's see it in action.
Start the Python Agent (Backend)
In your backend directory, run the command below, which ensures the env files are loaded:
uv run --env-file .env main.py
You should see:
Here is how the alert looks in the backend logs when an intruder is detected and when it is sent to the frontend:
Start the React App (Frontend)
npm run dev
You should see:
Click the http://localhost:5173 link to open your dashboard.
Testing the Security System
- Clear the Camera: Step out of the frame so the camera sees no one.
- Enter the Frame: Walk into view.
-
Watch the Logs: You should see the backend print
🚨 1 PERSON(S) DETECTED!and the frontend log panel update instantly with a red border flash.
Congratulations! You have just built a real-time AI security agent.
Complete Source Code
We have built the Backend Security Agent and the Frontend Dashboard as two distinct services. You can find the complete source code for both parts in the repositories below:
-
Backend Security Agent (Python)
- Contains: The custom YOLO processor, agent configuration, and token generation script.
-
Frontend Dashboard (React)
- Contains: The video player component, event listeners, and alert UI logic.
Conclusion
In this tutorial, we built a real-time security dashboard using Python and Vision Agents on the backend and React with Stream Video on the frontend. Along the way, we covered
- Implementing human-focused detection logic
- Setting up a real-time video interface
- Broadcasting and displaying security alert events
Although this example uses standard YOLO models, Stream Vision Agents also support integrations with other vision and multimodal systems such as Roboflow, MoonDream, and Decart.
Want to keep building with Vision Agents? Explore these tutorials next:





Top comments (0)