Handling 8TB of mobile pings every month is a nightmare for traditional clustering. I spent a lot of time with DBSCAN—the results were great, but the compute cost at this scale is a killer.
My current "lazy" experiment: Stop clustering and start indexing.
I’m treating H3 hexagons as pre-realized clusters. Instead of running expensive algorithms, I’m letting the spatial grid do the discretization for me.
If you’ve pushed spatial indexing to its breaking point at the terabyte level, tell me: where is this going to fail?

Top comments (0)