We Should Have Never Built That Treasure Hunt Engine — A Postmortem

#webdev #machinelearning #programming #ai

The Problem We Were Actually Solving

We needed to keep 20 000 developers glued to the exhibition floor for 72 hours. Product had promised sponsors a 25 % lift in booth visits, and marketing had already booked the billboards. The old way—static QR codes—gave us a 7 % click rate. We wanted something dynamic, something that felt intelligent, something that could whisper to the user: You are two meters from booth 14B, scan this now. The AI team showed us a notebook that turned 0.8 seconds of Wi‑Fi packets into a room‑level location. The numbers looked good in the notebook.

What We Tried First (And Why It Failed)

We reached for a lightweight transformer encoder fine‑tuned on 300 labeled floor‑plans. It ran at 1.9 ms per forward pass on an A100, so we piped every Wi‑Fi scan through it. But the encoder was 32 MiB on disk, and we had to run it on every handset. On a Pixel 6, that added 50–80 ms of latency and drained the battery. Users started complaining the app became sluggish after three scans. Then the first live failure hit: the transformer hallucinated a booth that didnt exist when two access points collided. The app showed a glowing blue dot in the hallway and said Scan booth 99G. Nobody knew booth 99G existed. Sponsors were furious.

We tried to trim the model with quantization to int8. The size dropped to 8 MiB, latency to 25 ms on the handset, but location error ballooned from 1.2 m to 4.7 m. A user stood at the coffee stand, and the app insisted they were in the keynote hall. By day two, half the scavenger tasks pointed to empty rooms, and users gave up. P99 latency for the entire flow—scanning, inference, card render—hit 1.8 s, which is the point where humans perceive lag.

The Architecture Decision

We had to drop the transformer entirely. Instead, we fell back to a deterministic trilateration engine that had shipped in six previous events. It used received signal strength (RSSI) from three or more access points, a simple path‑loss model, and a 1 Hz Kalman filter. We added a fast fallback: if fewer than three APs were visible, we used the last known good location plus a decay factor. The model size was 512 bytes and ran at 0.3 ms on the handset. We wrapped it in a Go micro‑service fronted by gRPC so we could still log every scan for offline tuning.

To keep the magic feel, we kept a lightweight LLM only for the card text. We fine‑tuned a 124 M parameter model on past scavenger‑hunt transcripts, then froze it. At runtime, we fed it the booth name, the exhibit title, and the sponsors tagline; it spat out a 280‑character card. The whole generation took 450 ms on an M2‑class Mac, well within the 1 s SLA. We never let the LLM see raw location data, so hallucinations stayed at 0.3 %.

What The Numbers Said After

After we rolled back to trilateration plus the frozen LLM, booth visit rate climbed to 22 %—still shy of the 25 % promise, but safe. P99 latency dropped to 620 ms. Battery drain per scan fell from 1.2 % to 0.2 %. Infra cost went from three A100 nodes to two t3.large CPUs running 24/7. The error radius stayed under 2 m for 94 % of scans. Sponsors stopped complaining, product declared victory, and I updated my résumé.

What I Would Do Differently

I would not have let the research team shoehorn a transformer into a latency‑sensitive mobile pipeline. A 32 MiB model that needs an A100 is theater, not engineering. If I had to do this again, I would build an offline trilateration engine first, log failures ruthlessly, then layer on small generative models for polish—not for location. And I would set an SLA at 500 ms on the handset before any demo leaves the lab. Anything slower is already too slow.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.