Building Location-Based AR Content Discovery with PostGIS and Three.js

#javascript #webdev #tutorial #opensource

When I started building a location-based AR platform, the part I thought would be easy turned out to be the hardest: efficiently querying which AR objects are "near" a user in real time, then rendering them correctly in 3D space relative to the camera. GPS coordinates feel simple until you're trying to do spatial radius queries on thousands of user-placed objects and then translate those coordinates into a Three.js scene that doesn't drift or jitter as the user moves.

This post is about that problem. If you're building anything with geo-anchored content — AR apps, location games, place-based social features — you've probably run into the same wall.

The Core Problem: Bridging GPS Space and 3D Render Space

GPS gives you latitude and longitude (and sometimes altitude). Three.js works in a local coordinate system measured in meters (or whatever unit you define). These two worlds don't talk to each other naturally.

The naive approach is to just subtract coordinates:

// DON'T do this
const dx = targetLon - userLon;
const dy = targetLat - userLat;
object.position.set(dx, 0, dy);

This breaks down immediately because one degree of longitude near the equator is about 111km, but near the poles it's nearly zero. Your scene will look fine if you're testing in one place, then completely fall apart for users somewhere else — or even for the same user moving north.

The correct approach is to convert from WGS84 (GPS coordinates) to a local East-North-Up (ENU) coordinate system centered on the user's current position. This gives you actual metric distances in a flat plane.

// Convert lat/lon to ENU coordinates relative to an origin point
function gpsToENU(targetLat, targetLon, originLat, originLon) {
  const R = 6371000; // Earth radius in meters
  const dLat = (targetLat - originLat) * (Math.PI / 180);
  const dLon = (targetLon - originLon) * (Math.PI / 180);

  const avgLat = ((targetLat + originLat) / 2) * (Math.PI / 180);

  // East (X) and North (Z) in meters
  const east = R * dLon * Math.cos(avgLat);
  const north = R * dLat;

  // In Three.js, Z is depth (into screen), so we negate north for alignment
  return { x: east, y: 0, z: -north };
}

// Usage
const userPos = { lat: 35.6812, lon: 139.7671 }; // Tokyo example
const anchorPos = { lat: 35.6815, lon: 139.7680 };

const enuCoords = gpsToENU(
  anchorPos.lat, anchorPos.lon,
  userPos.lat, userPos.lon
);

arObject.position.set(enuCoords.x, enuCoords.y, enuCoords.z);

The key insight: you recalculate this every time the user moves. The origin is always the user's current position, so objects near them are close to (0,0,0) in scene space and objects far away are simply further from origin. This keeps your floating point numbers small and precise, which matters a lot for AR stability.

The Database Side: Spatial Queries with PostGIS

Before you can render anything, you need to know which AR objects exist near the user. This is where a lot of people reach for a simple "find all objects and filter client-side" approach. Works fine at 100 objects. Terrible at 50,000.

PostGIS on top of Postgres (which Supabase uses) gives you proper spatial indexing. Here's the table setup:

-- Enable PostGIS extension (already enabled in Supabase)
create extension if not exists postgis;

-- AR content anchors table
create table ar_anchors (
  id uuid primary key default gen_random_uuid(),
  user_id uuid references auth.users(id),
  title text not null,
  content_type text check (content_type in ('3d_model', 'text', 'photo')),
  content_url text,
  metadata jsonb default '{}',
  location geography(Point, 4326) not null,
  created_at timestamptz default now()
);

-- Spatial index — critical for performance
create index ar_anchors_location_idx 
  on ar_anchors using gist(location);

-- Query: find anchors within 500 meters of user
-- You'd call this via a Supabase RPC or directly
select 
  id,
  title,
  content_type,
  content_url,
  metadata,
  ST_Y(location::geometry) as lat,
  ST_X(location::geometry) as lon,
  ST_Distance(
    location, 
    ST_MakePoint(139.7671, 35.6812)::geography
  ) as distance_meters
from ar_anchors
where ST_DWithin(
  location,
  ST_MakePoint(139.7671, 35.6812)::geography,
  500  -- radius in meters
)
order by distance_meters asc;

ST_DWithin with a geography type gives you geodesic distance calculations that are accurate regardless of where on earth you are. It also uses your spatial index, so this query stays fast even with a lot of data.

In Supabase you'd expose this as an RPC function and call it from your client whenever the user's position changes significantly (I use a 10-meter threshold to avoid spamming queries on every GPS tick).

Handling the Camera Orientation Problem

Here's where things get tricky for actual AR. You need to align your Three.js camera with the physical direction the device is facing. The DeviceOrientationEvent gives you alpha (compass heading), beta (tilt), and gamma (roll). Mapping these to a Three.js camera rotation without gimbal lock or drift is genuinely annoying.

A few things I learned the hard way:

Don't use Euler angles directly. The order matters and you'll spend hours debugging rotation weirdness. Use quaternions.

Alpha isn't always compass-relative. On iOS, alpha is relative to the initial orientation when the page loaded. On Android with Chrome, it's relative to magnetic north. You need to handle both cases, ideally by requesting requestPermission on iOS and checking absolute on the DeviceOrientationEvent.

The drift problem is real. Magnetometers in phones are noisy. Any metal nearby (a car, a building with rebar) will throw off your compass. You can apply a low-pass filter to smooth jitter, but there's no perfect solution — this is a hardware limitation.

// Simplified camera orientation from device sensors
function updateCameraOrientation(event, camera) {
  // event.alpha = heading (degrees, 0-360)
  // event.beta = tilt front/back (-180 to 180)
  // event.gamma = tilt left/right (-90 to 90)

  if (event.alpha === null) return;

  const alphaRad = (event.alpha * Math.PI) / 180;
  const betaRad = (event.beta * Math.PI) / 180;
  const gammaRad = (event.gamma * Math.PI) / 180;

  // Build rotation quaternion — ZXY order matches device orientation spec
  const euler = new THREE.Euler(betaRad, alphaRad, -gammaRad, 'ZXY');
  const quaternion = new THREE.Quaternion().setFromEuler(euler);

  // Apply a correction for screen orientation if needed
  const screenOrientation = window.screen.orientation?.angle ?? 0;
  const screenQuat = new THREE.Quaternion().setFromAxisAngle(
    new THREE.Vector3(0, 0, 1),
    (-screenOrientation * Math.PI) / 180
  );

  camera.quaternion.copy(quaternion.multiply(screenQuat));
}

// Attach listener
window.addEventListener('deviceorientation', (e) => {
  updateCameraOrientation(e, myThreeCamera);
}, true);

Even with this, you'll notice the camera "drifts" slightly over time on many devices. Some teams add a manual calibration step where users point their phone at a QR code or a known landmark to reset the heading. For social AR where exact precision isn't critical, a slight drift is often acceptable — users are forgiving if a virtual object is a few degrees off from where it "should" be.

PWA vs Native: What Actually Matters Here

If you're doing this as a PWA (which I am with CityCanvas), you face one specific limitation: DeviceOrientationEvent requires either HTTPS or a permission prompt on iOS 13+. Make absolutely sure your deployment is HTTPS, otherwise you'll get zero sensor data and no useful error message.

Capacitor (which wraps your web app as a native app) gives you access to native AR frameworks if you need them — ARKit on iOS, ARCore on Android. These handle a lot of the sensor fusion for you and give much better tracking. But they also add significant complexity to your build pipeline. For a first version, I'd suggest getting the web-based approach working first, then considering Capacitor/native AR as an enhancement.

The PWA approach also lets you test quickly in a browser on desktop using a webcam and keyboard controls simulating GPS movement, which is way faster than deploying to a phone every time.

What I'd Do Differently

Looking back at the architecture decisions:

Start with a smaller search radius than you think you need. 500 meters sounds reasonable but if users are placing lots of content, that's potentially hundreds of objects in a city center. I now paginate the results and only fully load models for objects within 100 meters, showing simpler "beacon" markers for anything 100-500m away.

Think about content loading budget from day one. Fetching a 3D model for every anchor in range will destroy load times. You need LOD (level of detail) — simple sprites far away, full 3D models close up. Three.js has a LOD class for this.

GPS accuracy is wildly variable. In dense urban environments (tall buildings, the "urban canyon" effect), GPS can be off by 20-50 meters. Design your UX so a 20m placement error isn't catastrophic. Making objects bigger, or giving users a visual "search" interaction rather than just walking to an exact spot, both help.

Don't try to solve all the hard AR problems at once. Surface tracking, occlusion, persistent anchors across sessions — these are research-level problems. Ship something that's fun with imperfect GPS and orientation, then iterate.

The geo + 3D rendering combination is genuinely one of the more interesting technical areas right now, and there's a lot of room to build things that weren't possible a few years ago just using browser APIs and cheap cloud databases. Most of the hard stuff is solvable with the approaches above — the magic is in what you let users create and discover with it.

I wrote a more detailed guide on my blog covering the full architecture including the 3D model generation pipeline and Capacitor setup: https://mcw999.github.io/mcw999-hub/blog/city-canvas-guide/