<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: João Vasconcelos</title>
    <description>The latest articles on DEV Community by João Vasconcelos (@jmnmv12).</description>
    <link>https://dev.to/jmnmv12</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F400615%2F483fed47-08fd-461b-997c-44a9d03f89e3.jpeg</url>
      <title>DEV Community: João Vasconcelos</title>
      <link>https://dev.to/jmnmv12</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jmnmv12"/>
    <language>en</language>
    <item>
      <title>Unveiling the Power of Spatial Clustering with PostgreSQL</title>
      <dc:creator>João Vasconcelos</dc:creator>
      <pubDate>Fri, 19 Jan 2024 13:53:44 +0000</pubDate>
      <link>https://dev.to/jmnmv12/unveiling-the-power-of-spatial-clustering-with-postgresql-5bbo</link>
      <guid>https://dev.to/jmnmv12/unveiling-the-power-of-spatial-clustering-with-postgresql-5bbo</guid>
      <description>&lt;p&gt;Spatial clustering, a technique pivotal in organizing geographical data points based on proximity, fuels the efficient display and analysis of location-based information across numerous platforms. Think about the way Airbnb showcases property listings or how Booking.com handles vast arrays of accommodations - these popular websites rely on spatial clustering to streamline user experiences.&lt;/p&gt;

&lt;p&gt;While client-side libraries like &lt;a href="https://www.mapbox.com/" rel="noopener noreferrer"&gt;Mapbox&lt;/a&gt; offer clustering capabilities, they can face performance challenges, especially with extensive datasets. In such cases, performing spatial clustering on the server-side, particularly using a robust database like PostgreSQL, emerges as a more efficient solution.&lt;/p&gt;

&lt;p&gt;This article dives into the significance of spatial clustering in enhancing performance, shedding light on how PostgreSQL, with its spatial extension, PostGIS, allow us to manage and optimize the usage of spatial data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F543ovc1hopmgoa3vyirz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F543ovc1hopmgoa3vyirz.png" alt="PostGis logo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Postgres with the PostGIS extension installed. You can use the docker-compose file below, it uses the official &lt;a href="https://hub.docker.com/r/postgis/postgis" rel="noopener noreferrer"&gt;PostGIS&lt;/a&gt; image for running Postgres with the PostGIS extension installed.&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;

version: '3'

services:
  postgres:
    image: postgis/postgis:latest
    container_name: my_postgres_container
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: mydatabase
      POSTGRES_USER: myuser
      POSTGRES_PASSWORD: mypassword
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Setting up the data
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Model definition
&lt;/h3&gt;

&lt;p&gt;Before we can cluster anything, we need data and with that in mind, we are going to create the following table in our database:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;last_location&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;SERIAL&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lat&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lng&lt;/span&gt; &lt;span class="nb"&gt;NUMERIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;geom&lt;/span&gt; &lt;span class="n"&gt;GEOMETRY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Point&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3395&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;-- PostGIS Point field with SRID 3395&lt;/span&gt;
    &lt;span class="n"&gt;created_on&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;modified_on&lt;/span&gt; &lt;span class="nb"&gt;TIMESTAMP&lt;/span&gt; &lt;span class="k"&gt;DEFAULT&lt;/span&gt; &lt;span class="k"&gt;CURRENT_TIMESTAMP&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;lat&lt;/code&gt; and the &lt;code&gt;lng&lt;/code&gt; fields represent the location in degrees following the &lt;a href="https://epsg.io/4326" rel="noopener noreferrer"&gt;WGS84&lt;/a&gt; reference system used in GPS. The &lt;code&gt;geom&lt;/code&gt; field will be a &lt;code&gt;geometry&lt;/code&gt; a spatial data type used to represent a feature in planar (Euclidean) coordinate systems. &lt;/p&gt;

&lt;p&gt;In this case, we will represent a point in the World Mercator spatial reference system, hence the SRID 3395. The &lt;a href="https://epsg.io/3395" rel="noopener noreferrer"&gt;World Mercator&lt;/a&gt; reference system represents almost the entire world (it excludes the polar regions) in meters.&lt;/p&gt;

&lt;p&gt;You might be asking "Why do we need 2 distinct ways of representing the location (WSG84 VS World Mercator)?"&lt;/p&gt;

&lt;p&gt;While the WSG84 system is great for representing and visualizing the location on a map (e.g. representing the points on a Web app,...)the World Mercator representation in meters will make the cluster calculations easier since it's a lot simpler to think of a cluster radius in meters than degrees.&lt;/p&gt;

&lt;h3&gt;
  
  
  Populating the data
&lt;/h3&gt;

&lt;p&gt;Now that we have our table ready to store the locations, we will need to populate it with data. We are not going to be super precise with the location, some of the data points might be located in the sea, but the goal here is to see how the clustering works.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;last_location&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ST_TRANSFORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ST_Point&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4326&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;3395&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;geom&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RANDOM&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;49&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;384358&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;396308&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;396308&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;-- Range of latitudes in the continental US&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;RANDOM&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;66&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;93457&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;125&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;001636&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;125&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;001636&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt; &lt;span class="c1"&gt;-- Range of longitudes in the continental US&lt;/span&gt;
    &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;generate_series&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;random_locations&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Using the SQL statement above, we are going to generate 500 random records using the range of latitudes and longitudes of the continental US. The &lt;code&gt;geom&lt;/code&gt; field is calculated by transforming the WSG84 representation, represented by the SRID (spatial reference identifier) 4326, into the World Mercator representation, SRID 3395.&lt;/p&gt;

&lt;p&gt;In the end, you should have 500 locations spread across the US and adjacent areas. The image below was taken from &lt;a href="https://dbeaver.com/docs/dbeaver/Working-with-Spatial-GIS-data/" rel="noopener noreferrer"&gt;DBeaver&lt;/a&gt;, it has a spatial data viewer which helps a lot when working with this type of data.&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43xlea0irvb8v4shom96.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F43xlea0irvb8v4shom96.png" alt="Location spread across the US"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Clustering the data
&lt;/h3&gt;

&lt;p&gt;After all this work, we finally got to the juicy part, clustering the data.&lt;/p&gt;

&lt;p&gt;To cluster the data we are going to use an operation provided by PostGIS, &lt;a href="https://postgis.net/docs/ST_ClusterDBSCAN.html" rel="noopener noreferrer"&gt;ST_CLUSTERDBSCAN&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This is a window function that returns a cluster number for each input geometry. Besides the geometry, you will need to specify the desired distance (&lt;code&gt;eps&lt;/code&gt;) and the density (&lt;code&gt;minpoints&lt;/code&gt;) parameters to determine each cluster. Since this is a window function and one geometry might be eligible to belong to multiple clusters, to ensure deterministic results we need to use an ORDER BY clause in the window definition.&lt;/p&gt;

&lt;p&gt;It's important to understand that the value you assign to &lt;code&gt;eps&lt;/code&gt; will take the unit of the geometry you are trying to cluster, that means that if your geometry is represented in degrees and &lt;code&gt;eps&lt;/code&gt; is 1000, you will be trying to form a cluster with a distance of 1000º and not 1000 meters or another length unit (I fell into this trap and spent about a day trying to understand why my clustering wasn't working).&lt;/p&gt;

&lt;p&gt;Let's try to use it and analyse the output.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="c1"&gt;-- Query&lt;/span&gt;
&lt;span class="c1"&gt;-- Cluster with a distance of 100000 meters&lt;/span&gt;
&lt;span class="c1"&gt;-- and a minimum of 2 locations to form a cluster&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="n"&gt;ST_CLUSTERDBSCAN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;eps&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;minpoints&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;over&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;geom&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="n"&gt;last_location&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;
&lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
    &lt;span class="n"&gt;cluster_id&lt;/span&gt;
&lt;span class="k"&gt;limit&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;-- Result&lt;/span&gt;
&lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;                                          &lt;span class="o"&gt;|&lt;/span&gt;
&lt;span class="c1"&gt;----------+----------------------------------------------+&lt;/span&gt;
         &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9620039&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;471564963&lt;/span&gt; &lt;span class="mi"&gt;4213160&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;92111902&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9685949&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;732485048&lt;/span&gt; &lt;span class="mi"&gt;4238072&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;815488585&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9626448&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;620191192&lt;/span&gt; &lt;span class="mi"&gt;4284895&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;837818226&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;9739000&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;227136273&lt;/span&gt; &lt;span class="mi"&gt;4185753&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;078454269&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10511656&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;94032765&lt;/span&gt; &lt;span class="mi"&gt;3263151&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;691116077&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10657529&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;563062856&lt;/span&gt; &lt;span class="mi"&gt;3319832&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;321128275&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10585249&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;991795693&lt;/span&gt; &lt;span class="mi"&gt;3285868&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;5630962&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;10556119&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;395681698&lt;/span&gt; &lt;span class="mi"&gt;3311324&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;9947481216&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;13744886&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;491918504&lt;/span&gt; &lt;span class="mi"&gt;3118709&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;415406296&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;
         &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;POINT&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;13638278&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;89205634&lt;/span&gt; &lt;span class="mi"&gt;2927643&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;4725107397&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbynssmvf2bzs3xcpzd7o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbynssmvf2bzs3xcpzd7o.png" alt="Clusters"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking at the partial result above, we formed 3 distinct clusters, identified by the &lt;code&gt;cluster_id&lt;/code&gt; and we can see what points belong to each cluster.&lt;/p&gt;

&lt;h3&gt;
  
  
  Finding the center of the cluster
&lt;/h3&gt;

&lt;p&gt;Now that we have formed our clusters, we still need to find the centroid and calculate the number of locations per cluster. This is the information we would need to represent the cluster on a Web interface.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ST_Y&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;ST_TRANSFORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;ST_Centroid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ST_COLLECT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;4326&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ST_X&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;ST_TRANSFORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;ST_Centroid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ST_COLLECT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="mi"&gt;4326&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ST_Centroid&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ST_COLLECT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;centroid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;count&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;num_points&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;select&lt;/span&gt;
        &lt;span class="n"&gt;ST_CLUSTERDBSCAN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;eps&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;minpoints&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;over&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
        &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;geom&lt;/span&gt;
    &lt;span class="k"&gt;from&lt;/span&gt;
        &lt;span class="n"&gt;last_location&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;
    &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
        &lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;cluster_loc&lt;/span&gt;
&lt;span class="k"&gt;group&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;
    &lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If you look at the results of the previous query, you will probably find a row with &lt;code&gt;cluster_id&lt;/code&gt; set to &lt;code&gt;null&lt;/code&gt;. This is because not all locations might be eligible to fit in a cluster so if you also want to represent the non-clustered locations you would execute the following query.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;

&lt;p&gt;&lt;span class="k"&gt;select&lt;/span&gt;&lt;br&gt;
    &lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="n"&gt;ST_Y&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;ST_TRANSFORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;br&gt;
            &lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="mi"&gt;4326&lt;/span&gt;&lt;br&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;&lt;br&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="n"&gt;ST_X&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;ST_TRANSFORM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;br&gt;
            &lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="mi"&gt;4326&lt;/span&gt;&lt;br&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;&lt;br&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="n"&gt;geom&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;centroid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
    &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;num_points&lt;/span&gt;&lt;br&gt;
&lt;span class="k"&gt;from&lt;/span&gt;&lt;br&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;br&gt;
    &lt;span class="k"&gt;select&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;ST_CLUSTERDBSCAN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;geom&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;eps&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;minpoints&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;over&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;br&gt;
    &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;geom&lt;/span&gt;&lt;br&gt;
    &lt;span class="k"&gt;from&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;last_location&lt;/span&gt; &lt;span class="n"&gt;ll&lt;/span&gt;&lt;br&gt;
    &lt;span class="k"&gt;order&lt;/span&gt; &lt;span class="k"&gt;by&lt;/span&gt;&lt;br&gt;
        &lt;span class="n"&gt;cluster_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;br&gt;
&lt;span class="k"&gt;where&lt;/span&gt;&lt;br&gt;
    &lt;span class="n"&gt;cluster_loc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cluster_id&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;/p&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Final thoughts&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;Mastering server-side clustering empowers you to enhance the efficiency of location-based applications. You can adapt the logic to your specific needs—consider refining the display by a bounding box, dynamically adjusting cluster radius with zoom levels, and exploring numerous possibilities. &lt;/p&gt;

&lt;p&gt;I trust this guide has been a helpful resource on your journey to optimize and elevate your location-based projects. Happy clustering!&lt;/p&gt;

&lt;p&gt;Visit my &lt;a href="https://github.com/jmnmv12/postgis_experiments/tree/main/1.Clustering" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; to check out all of the code used in this post. &lt;/p&gt;




&lt;p&gt;&lt;em&gt;Have questions? Any feedback is welcomed and I will be glad to start a discussion in the comments section.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>clustering</category>
      <category>database</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
