Getting hardware to transmit data from a moving bus is hard. Processing that data into actionable, real-time insights is a completely different challenge.
In Part 1, I built an ESP32-CAM node to push GPS and image data over a 2G cellular network.
Now, I need a backend capable of catching that data, analyzing the crowd size using AI, and syncing it to a mobile app in milliseconds. Here is how I architected the cloud infrastructure using FastAPI, Hugging Face, Google Gemini, and React.
1. The FastAPI Receiver
The ESP32 sends a continuous stream of HTTP POST requests. I deployed a Python FastAPI backend on Hugging Face Spaces to catch them.
Memory on the hardware side is tight. I couldn't send massive JSON payloads without crashing the board.
Instead, I injected the GPS coordinates and speed directly into the HTTP headers. The actual image is sent as raw binary data in the request body.
Python
@app.post("/api/update-bus")
async def update_bus(
request: Request,
bus_id: str = Header(..., alias="bus-id"),
bus_lat: float = Header(..., alias="bus-lat"),
bus_lng: float = Header(..., alias="bus-lng"),
bus_speed: float = Header(..., alias="bus-speed"),
):
# Read raw image binary directly from the request body
image_bytes = await request.body()
if not image_bytes:
return JSONResponse({"status": "error", "message": "No image data"}, status=400)
2. Dual AI Processing (The Flex)
Relying on a single AI model for object detection in a production environment is risky. If the API rate-limits or fails, the entire tracking app breaks.
To solve this, I used two different models: a Hugging Face DETR model and Google's Gemini 2.5 Flash API.
Running them sequentially would take too long and introduce severe latency to the real-time map. I used Python's ThreadPoolExecutor to run both AI inferences concurrently on separate threads.
Python
# 1. Concurrent AI Analysis
gemini_count = 0
hf_count = 0
with ThreadPoolExecutor(max_workers=2) as executor:
future_gem = executor.submit(analyze_with_gemini, image_bytes)
future_hf = executor.submit(analyze_with_huggingface, image_bytes)
gemini_count = future_gem.result()
hf_count = future_hf.result()
# 2. Calculate Average & Crowd Level
if gemini_count > 0 and hf_count > 0:
avg_count = round((gemini_count + hf_count) / 2)
else:
avg_count = max(gemini_count, hf_count) # Fallback if one API fails
This cuts the processing time in half. It also guarantees a crowd count even if one API goes offline. The avg_count is then mapped to a string status: Low, Moderate, High, or Very High.
3. Firebase Real-Time Sync
Once the backend calculates the crowd level, it needs to push the data to the frontend immediately.
I used Firebase Firestore. The critical detail here is updating the live location and crowd status without overwriting static database fields (like the bus route number or destination).
Python
try:
bus_ref = db.collection('public').document('data').collection('buses').document(bus_id)
bus_data = {
"id": bus_id,
"lat": bus_lat,
"lng": bus_lng,
"speed": bus_speed,
"peopleCount": avg_count,
"crowdLevel": crowd_status,
"lastUpdated": firestore.SERVER_TIMESTAMP
}
# merge=True prevents overwriting existing route data
bus_ref.set(bus_data, merge=True)
except Exception as e:
print(f"Firestore Write Failed: {e}")
4. The React Frontend
The final piece of the architecture is the user interface. The React app actively listens to the Firestore database.
When a user searches for a route, the app dynamically renders the bus cards. It reads the crowdLevel string and injects Tailwind CSS classes to change the UI colors—amber for moderate, red for crowded.
JavaScript
// Determining color coding for crowd levels
const crowd = bus.crowdLevel || 'Moderate';
const crowdIsVery = crowd.toLowerCase().includes('very');
// Inside the component return:
<div
className={`flex items-center gap-1.5 text-sm font-bold px-3 py-1 rounded-full ${
crowdIsVery ? 'bg-red-50 text-red-600' : 'bg-amber-50 text-amber-600'
}`}
>
<Users size={14} />
<span>{crowd}</span>
</div>
The Result
A hardware node captures the real world. A Python backend processes it concurrently using dual AI models. A React app visualizes it for the user in real-time.
This architecture successfully bridges physical IoT with cloud AI.
Top comments (0)