I'm Building an AI System That Tracks Criminals Across an Entire City – Here's How
Every day, cities like Lahore, Karachi, and Islamabad record thousands of hours of CCTV footage. But when a crime happens, police often have to manually sift through video after the criminal has already fled. What if the cameras themselves could not only detect a weapon but actively track the suspect across the entire city and predict where they’re heading next?
That’s the question driving my Final Year Project.
The Problem
- Current Safe City projects use AI mainly for facial recognition and traffic enforcement.
- Weapon detection research exists, but it stops at a single camera.
- There’s no integrated system that detects a crime in progress, verifies it, and then launches a real-time cross-camera manhunt.
The Idea: AegisNet
AegisNet is a retrofittable, edge-AI surveillance system that upgrades ordinary CCTV cameras to work as an intelligent, city-wide security grid.
Here’s what makes it different:
Two-Tier Alert Logic
| Level | Trigger | System Response |
|---|---|---|
| 1 | A person is seen carrying a weapon (gun/knife) | A 15-second video clip is instantly sent to the police dashboard. No city-wide tracking yet. |
| 2 | The weapon is used – a shot fired, an attack, or someone falling after an assault | Full cross-camera manhunt activated. The suspect’s appearance embedding is matched across all connected cameras in real time. |
This escalation prevents alarm fatigue while ensuring that when seconds matter, the system reacts like a city-wide digital watchdog.
Cross-Camera Suspect Tracking (ReID)
Once Level 2 is triggered, the suspect isn’t just a dot on a single screen. Their visual identity – clothing, posture, unique features – is turned into a numerical embedding that’s broadcast to every camera node. As the person moves from one field of view to another, the system reconstructs their path automatically.
Escape Route Prediction
We model the city’s camera network as a graph. When a suspect is last seen at a certain location, the system predicts the streets they’re most likely to take and suggests cordon points to law enforcement – all in real time.
Privacy by Design
All AI processing happens on low-cost edge devices (like NVIDIA Jetson or industrial gateways) attached to existing cameras. No raw video ever leaves the camera site. Only incident clips and anonymous metadata travel to the central police server.
The Tech Stack (So Far)
- YOLOv8 for weapon and person detection
- Person Re-ID models for cross-camera identity matching
- MoveNet / video classifiers for activity recognition (fights, gunfire)
- Message queues (RabbitMQ / Redis) for scalable server-to-server communication (designed, not fully built for prototype)
- Flask + WebSockets for the real-time police dashboard
- Docker for containerized edge emulation
We’re building the prototype using a mix of USB/IP cameras and a centralized laptop – simulating the edge-to-server architecture that would eventually be deployed on real Jetson devices.
Why This? Why Now?
Several universities in Pakistan have worked on weapon detection (UET, NUST, GIFT). Others have tackled person re-identification (NUST, Bahria). Government Safe City projects have thousands of cameras with facial recognition. But no one has integrated all of these into a single, actionable, real-time criminal tracking pipeline – especially one that can be retrofitted onto existing infrastructure.
AegisNet is that missing layer.
I’d Love Your Input!
If you’ve worked on:
- Smart city surveillance
- Person re-identification at the edge
- Real-time video analytics pipelines
- Privacy-preserving AI
… I’d genuinely appreciate your thoughts, critiques, or suggestions on the architecture or any pitfalls you foresee.
Let’s discuss in the comments!
Top comments (0)