Building a Live Bus Tracker: AI Crowd Analysis & Real-Time Sync (Part 2)

#ai #fastapi #iot #python

Getting hardware to transmit data from a moving bus is hard. Processing that data into actionable, real-time insights is a completely different challenge.

In Part 1, I built an ESP32-CAM node to push GPS and image data over a 2G cellular network.

Now, I need a backend capable of catching that data, analyzing the crowd size using AI, and syncing it to a mobile app in milliseconds. Here is how I architected the cloud infrastructure using FastAPI, Hugging Face, Google Gemini, and React.

1. The FastAPI Receiver

The ESP32 sends a continuous stream of HTTP POST requests. I deployed a Python FastAPI backend on Hugging Face Spaces to catch them.

Memory on the hardware side is tight. I couldn't send massive JSON payloads without crashing the board.

Instead, I injected the GPS coordinates and speed directly into the HTTP headers. The actual image is sent as raw binary data in the request body.

Python
@app.post("/api/update-bus")
async def update_bus(
    request: Request,
    bus_id: str = Header(..., alias="bus-id"),
    bus_lat: float = Header(..., alias="bus-lat"),
    bus_lng: float = Header(..., alias="bus-lng"),
    bus_speed: float = Header(..., alias="bus-speed"),
):
    # Read raw image binary directly from the request body
    image_bytes = await request.body()

    if not image_bytes:
        return JSONResponse({"status": "error", "message": "No image data"}, status=400)

2. Dual AI Processing (The Flex)

Relying on a single AI model for object detection in a production environment is risky. If the API rate-limits or fails, the entire tracking app breaks.

To solve this, I used two different models: a Hugging Face DETR model and Google's Gemini 2.5 Flash API.

Running them sequentially would take too long and introduce severe latency to the real-time map. I used Python's ThreadPoolExecutor to run both AI inferences concurrently on separate threads.

Python
    # 1. Concurrent AI Analysis
    gemini_count = 0
    hf_count = 0

    with ThreadPoolExecutor(max_workers=2) as executor:
        future_gem = executor.submit(analyze_with_gemini, image_bytes)
        future_hf = executor.submit(analyze_with_huggingface, image_bytes)
        gemini_count = future_gem.result()
        hf_count = future_hf.result()

    # 2. Calculate Average & Crowd Level
    if gemini_count > 0 and hf_count > 0:
        avg_count = round((gemini_count + hf_count) / 2)
    else:
        avg_count = max(gemini_count, hf_count) # Fallback if one API fails

This cuts the processing time in half. It also guarantees a crowd count even if one API goes offline. The avg_count is then mapped to a string status: Low, Moderate, High, or Very High.

3. Firebase Real-Time Sync

Once the backend calculates the crowd level, it needs to push the data to the frontend immediately.

I used Firebase Firestore. The critical detail here is updating the live location and crowd status without overwriting static database fields (like the bus route number or destination).

Python
        try:
            bus_ref = db.collection('public').document('data').collection('buses').document(bus_id)

            bus_data = {
                "id": bus_id,
                "lat": bus_lat,
                "lng": bus_lng,
                "speed": bus_speed,
                "peopleCount": avg_count,
                "crowdLevel": crowd_status,
                "lastUpdated": firestore.SERVER_TIMESTAMP
            }

            # merge=True prevents overwriting existing route data
            bus_ref.set(bus_data, merge=True)
        except Exception as e:
            print(f"Firestore Write Failed: {e}")

4. The React Frontend

The final piece of the architecture is the user interface. The React app actively listens to the Firestore database.

When a user searches for a route, the app dynamically renders the bus cards. It reads the crowdLevel string and injects Tailwind CSS classes to change the UI colors—amber for moderate, red for crowded.

JavaScript
// Determining color coding for crowd levels
const crowd = bus.crowdLevel || 'Moderate';
const crowdIsVery = crowd.toLowerCase().includes('very');

// Inside the component return:
<div
  className={`flex items-center gap-1.5 text-sm font-bold px-3 py-1 rounded-full ${
    crowdIsVery ? 'bg-red-50 text-red-600' : 'bg-amber-50 text-amber-600'
  }`}
>
  <Users size={14} />
  <span>{crowd}</span>
</div>

The Result
A hardware node captures the real world. A Python backend processes it concurrently using dual AI models. A React app visualizes it for the user in real-time.

This architecture successfully bridges physical IoT with cloud AI.