Building a Live Bus Tracker: AI Crowd Analysis & Real-Time Sync (Part 2)

Anhaj Uwaisulkarni — Sun, 15 Feb 2026 07:20:11 +0000

Getting hardware to transmit data from a moving bus is hard. Processing that data into actionable, real-time insights is a completely different challenge.

In Part 1, I built an ESP32-CAM node to push GPS and image data over a 2G cellular network.

Now, I need a backend capable of catching that data, analyzing the crowd size using AI, and syncing it to a mobile app in milliseconds. Here is how I architected the cloud infrastructure using FastAPI, Hugging Face, Google Gemini, and React.

1. The FastAPI Receiver

The ESP32 sends a continuous stream of HTTP POST requests. I deployed a Python FastAPI backend on Hugging Face Spaces to catch them.

Memory on the hardware side is tight. I couldn't send massive JSON payloads without crashing the board.

Instead, I injected the GPS coordinates and speed directly into the HTTP headers. The actual image is sent as raw binary data in the request body.

Python
@app.post("/api/update-bus")
async def update_bus(
    request: Request,
    bus_id: str = Header(..., alias="bus-id"),
    bus_lat: float = Header(..., alias="bus-lat"),
    bus_lng: float = Header(..., alias="bus-lng"),
    bus_speed: float = Header(..., alias="bus-speed"),
):
    # Read raw image binary directly from the request body
    image_bytes = await request.body()

    if not image_bytes:
        return JSONResponse({"status": "error", "message": "No image data"}, status=400)

2. Dual AI Processing (The Flex)

Relying on a single AI model for object detection in a production environment is risky. If the API rate-limits or fails, the entire tracking app breaks.

To solve this, I used two different models: a Hugging Face DETR model and Google's Gemini 2.5 Flash API.

Running them sequentially would take too long and introduce severe latency to the real-time map. I used Python's ThreadPoolExecutor to run both AI inferences concurrently on separate threads.

Python
    # 1. Concurrent AI Analysis
    gemini_count = 0
    hf_count = 0

    with ThreadPoolExecutor(max_workers=2) as executor:
        future_gem = executor.submit(analyze_with_gemini, image_bytes)
        future_hf = executor.submit(analyze_with_huggingface, image_bytes)
        gemini_count = future_gem.result()
        hf_count = future_hf.result()

    # 2. Calculate Average & Crowd Level
    if gemini_count > 0 and hf_count > 0:
        avg_count = round((gemini_count + hf_count) / 2)
    else:
        avg_count = max(gemini_count, hf_count) # Fallback if one API fails

This cuts the processing time in half. It also guarantees a crowd count even if one API goes offline. The avg_count is then mapped to a string status: Low, Moderate, High, or Very High.

3. Firebase Real-Time Sync

Once the backend calculates the crowd level, it needs to push the data to the frontend immediately.

I used Firebase Firestore. The critical detail here is updating the live location and crowd status without overwriting static database fields (like the bus route number or destination).

Python
        try:
            bus_ref = db.collection('public').document('data').collection('buses').document(bus_id)

            bus_data = {
                "id": bus_id,
                "lat": bus_lat,
                "lng": bus_lng,
                "speed": bus_speed,
                "peopleCount": avg_count,
                "crowdLevel": crowd_status,
                "lastUpdated": firestore.SERVER_TIMESTAMP
            }

            # merge=True prevents overwriting existing route data
            bus_ref.set(bus_data, merge=True)
        except Exception as e:
            print(f"Firestore Write Failed: {e}")

4. The React Frontend

The final piece of the architecture is the user interface. The React app actively listens to the Firestore database.

When a user searches for a route, the app dynamically renders the bus cards. It reads the crowdLevel string and injects Tailwind CSS classes to change the UI colors—amber for moderate, red for crowded.

JavaScript
// Determining color coding for crowd levels
const crowd = bus.crowdLevel || 'Moderate';
const crowdIsVery = crowd.toLowerCase().includes('very');

// Inside the component return:
<div
  className={`flex items-center gap-1.5 text-sm font-bold px-3 py-1 rounded-full ${
    crowdIsVery ? 'bg-red-50 text-red-600' : 'bg-amber-50 text-amber-600'
  }`}
>
  <Users size={14} />
  <span>{crowd}</span>
</div>

The Result
A hardware node captures the real world. A Python backend processes it concurrently using dual AI models. A React app visualizes it for the user in real-time.

This architecture successfully bridges physical IoT with cloud AI.

Building a Live Bus Tracker with ESP32-CAM, GPS, and Cellular Data (Part 1)

Anhaj Uwaisulkarni — Sun, 15 Feb 2026 06:51:09 +0000

Public transportation has a massive data problem. Commuters constantly face unpredictable arrival times and have no idea how crowded a bus is until it pulls up.

I decided to fix this by building a self-contained IoT device for buses. It tracks live GPS coordinates and uses an onboard camera to capture the interior.

This is Part 1 of my case study. I will break down the hardware node, the C++ firmware, and how I managed to reliably transmit image data over a 2G cellular network.

The Hardware Stack

The goal was to keep the unit low-cost but capable of handling network failovers and image processing.

ESP32-CAM: The central brain of the operation. It handles the logic, captures the JPEG image, and manages the network connections.
NEO-6M GPS: Connected via serial to constantly pull latitude, longitude, and speed data.
SIM800L Module: Handles the cellular data transmission. Getting images over a 2G connection is tough, but necessary for mobile transit tracking.
LM2596 Buck Converter: Steps down the volatile 12V/24V bus battery to a clean 5V to keep the modules from frying.

The Firmware: Solving Network Drops

I wrote the C++ firmware to handle network failovers automatically. The ESP32 first attempts a WiFi connection (useful for debugging at the terminal). If that fails, it instantly restarts the SIM module and falls back to a GPRS cellular connection.

C++
  // Network Failover Logic
  if (WiFi.status() == WL_CONNECTED) {
    Serial.println("\n✅ WiFi Connected!");
    activeClient = &wifiClient;
    connected = true;
  } else {
    Serial.println("\n❌ WiFi Failed. Trying SIM800L...");
    modem.restart();

    // Attempt GPRS connection
    while (!modem.isGprsConnected() && millis() - gsmStart < 10000) {  
      modem.gprsConnect(apn, gprsUser, gprsPass);
    }
  }

The Firmware: Chunking Image Data
The biggest headache in IoT development is memory management. The ESP32-CAM does not have the RAM to load a massive HTTP POST request into memory all at once.

If you try to send the entire JPEG buffer and the HTTP headers in a single client.print() command, the board will crash and reboot.

To solve this, I structured the HTTP request as multipart/form-data. I injected the GPS coordinates into the custom headers, and then sent the actual image binary in strict 1024-byte chunks.

C++
      // Sending the whole JPEG buffer in chunks to prevent crashes
      uint8_t *fbBuf = fb->buf;
      size_t fbLen = fb->len;
      size_t sent = 0;
      const size_t CHUNK_SIZE = 1024;

      while (sent < fbLen) {
        size_t toSend = CHUNK_SIZE;
        if (fbLen - sent < CHUNK_SIZE) {
          toSend = fbLen - sent;  // Last chunk
        }

        activeClient->write(fbBuf + sent, toSend);
        sent += toSend;
        delay(50);  // Buffer breathing room
      }

This 50ms delay between chunks ensures the SIM800L module does not get overwhelmed and drop packets over the slow 2G network.

What's Next?
Getting the hardware to reliably capture and transmit data from a moving vehicle is only half the battle.

In Part 2, I will break down the cloud infrastructure. I will show how I deployed a Python backend to Hugging Face, used the Google Gemini API to analyze the images for crowd density, and synced it all in real-time to a React frontend.

DEV Community: Anhaj Uwaisulkarni