DEV Community

Adam Hill
Adam Hill

Posted on • Edited on • Originally published at alldjango.com

2 2

Searching within an area with GeoDjango and PostGIS

One feature of InHerSight.com that has been proved to be extremely useful is the ability to use location as a filter when exploring potential companies. Luckily, we are using Django and PostgreSQL which has robust support for geographic querying using GeoDjango and PostGIS. I also utilize the awesome django-cities package for an astounding amount of data related to places.

There wasn't a lot of great documentation on StackOverflow or through my searching, so I did a lot of trial and error to get to a sufficiently performing solution.

A simplified version of the initial code looked something like this:

# models.py
from django.contrib.contenttypes.fields import GenericRelation
from django.contrib.gis.db.models import PointField
from cities.models import City

class Company(models.Model):
    name = models.CharField()
    places = GenericRelation(Place)

class Place(models.Model):
    point = PointField()
    city = models.ForeignKey(City)


# views.py
from django.db.models import F
from django.contrib.gis.db.models import PointField
from cities.models import City

distance_miles_to_search = 50
point = PointField()  # get a point of a particular place (usually by latitude/longitude)
companies = Company.objects.filter(places__point__distance_lte=(point, D(mi=distance_miles_to_search))).annotate(closest_city_id=F('places__city'))
Enter fullscreen mode Exit fullscreen mode

The distance_lte filter definitely worked for the initial MVP, however, as we continued to add in companies and locations, we started noticing some blocking when searching for companies by location. Then, when we would get a burst of traffic the slowdowns became even more evident. New Relic and Heroku's metrics panels were very helpful pinpointing where the slowdowns were happening. However, even after many rounds of optimizations, InHerSight hit a brickwall trying to squeeze any more performance from the existing architecture.

I thought I could reduce the blocking by setting up a readonly replica for the master database by distributing the reads across two databases. That definitely helped, but even with minimal load distance-based queries were still around 2500ms -- well above the threshold for acceptability.

The distance_lte filter translates to the ST_Distance function in PostGIS. I had read in various places (specifically, http://stackoverflow.com/questions/7845133/how-can-i-query-all-my-data-within-a-distance-of-5-meters and http://stackoverflow.com/questions/2235043/geodjango-difference-between-dwithin-and-distance-lt) that there was also the dwithin filter clause in GeoDjango, but I never could get it working correctly until digging into http://stackoverflow.com/questions/31940674/optimal-query-in-geodjango-postgis-for-locations-within-a-distance-in-meters. dwithin translates to the ST_DWithin function in PostGIS and uses a spatial index that should result in faster performance.

The simplified version of my code to use the dwithin filter clause:

# views.py
from django.contrib.gis.measure import D
from django.db.models import F
from cities.models import City


def convert_miles_to_degrees(miles):
    '''
    Convert miles to degrees. Somewhat imprecise, but probably good enough.
    '''

    meters = D(mi=miles).m
    degrees = (meters / 40000000 * 360)

    return degrees

degrees = convert_miles_to_degrees(50)
point = City.objects.get(pk=...).location
companies = Company.objects.filter(places__point__dwithin=(point, degrees))\
                .annotate(closest_city_id=F('places__city'))
Enter fullscreen mode Exit fullscreen mode

Using the ST_DWithin improved location queries by approximently 30x on my local development machine and a still very respectable 10x on the live databases. That query now hovers around 200ms even with load. I'm caching data pretty aggressively when I can and there might be some opportunities to pre-compute some typical searches or use something like ElasticSearch or Lucene in the future, but just switching from ST_Distance to ST_DWithin significantly improved location-based query performance.

Originally published on adamghill.com.

Image of Timescale

Timescale – the developer's data platform for modern apps, built on PostgreSQL

Timescale Cloud is PostgreSQL optimized for speed, scale, and performance. Over 3 million IoT, AI, crypto, and dev tool apps are powered by Timescale. Try it free today! No credit card required.

Try free

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Engage with a sea of insights in this enlightening article, highly esteemed within the encouraging DEV Community. Programmers of every skill level are invited to participate and enrich our shared knowledge.

A simple "thank you" can uplift someone's spirits. Express your appreciation in the comments section!

On DEV, sharing knowledge smooths our journey and strengthens our community bonds. Found this useful? A brief thank you to the author can mean a lot.

Okay