DEV Community

Cover image for (Re-)Districting
donovanrichardson
donovanrichardson

Posted on • Updated on

(Re-)Districting

Here, I'll follow up on my LinkedIn post on an algorithm that I've been designing to delineate cohesive "neighborhoods" or "districts" based on just a road network, for use with political redistricting, determining business service areas, or just approximating areas of common existence. Find it on GitHub here. In the earlier post, I shared a few pictures of the districts that this algorithm created in Manhattan. Since then, I've made a few improvements to the algorithm:

  • Previously due to a bug, the Python implementation of this algorithm could terminate unpredictably early, usually one or two iterations before it was supposed to.
  • The big one: it seems that this algorithm now runs in O(n×log(n)) time on average. Previously it ran O(n2) on average. One caveat: with a worst-case road network that is shaped like an asterisk, this algorithm will run in O(n2) time; in that case, the old algorithm would have run in O(n3) time. Yikes!
  • The Python implementation now outputs the results of the algorithm to CSV. This lets you import the data into a GIS like ArcMap or QGIS in order to display it. I found this to be prettier, easier, faster, and more accessible than plotting a result graph as a figure in Python.

All three of these things improved issues that I thought I just might have to live with. For the sake of my peace of mind, I'm glad that I've been able to fix them!

The Algorithm

In my LinkedIn post, I highlighted two soft constraints for this algorithm:

  • group areas that have small distances between them before grouping areas with long distances between them
  • grouping core areas with peripheral areas before connecting different core areas with each other.

The goal of these constraints is to approximate how different places might have real-life human connections.

Keeping that in mind, here's some pseudocode for the algorithm:

  1. Break graph into n subgraphs, where n is the number of nodes in the graph.
  2. While there are more subgraphs than connected components (i.e. while there are still edges that connect different subgraphs):
    1. For each edge e which connects two separate subgraphs
      1. Where v1 and v2 are the two vertices coincident with e, l1 and l2 are the lengths of all edges coincident on v1 and v2 respectively, lv is the least of these two lengths, and le is the length of edge e, Calculate a score c where c = le * lv
      2. Sort these edges based on their score (done with Postgres index in my implementation)
    2. For each of the sorted edges in order:
      1. join subgraphs incident on this edge if both subgraphs have not been joined in this iteration (each iteration begins on line 2.).
      2. Make a record of the subgraphs existing at the end of this iteration (otherwise the algorithm will terminate as one subgraph for each connected component, with no information on how the graph was delineated into districts)

The cause of the O(n2) problem was that the loop in 2.1. was previously nested within the loop of 2.2; instead of two separate loops iterating through each edge, the nested loops iterated through each edge to calculate the score for just one edge. This error was unnecessary and wasteful, since the result of the loop in 2.1 will essentially be the same within the same iteration, except for results concerning newly-connected components which are not considered in 2.2 anyway.

Gallery

With 9606 junctions and 15882 road segments on OpenStreetMap, San Francisco was too large for my O(n2) algorithm to run within a reasonable time. With the improved algorithm, it was possible to delineate this city into districts. I'll be posting follow-ups with other regions/cities, so be sure to check back!

districts at 7th iteration

San Francisco districts at 7th iteration

districts at 9th iteration

San Francisco districts at 9th iteration

districts at 11th iteration

San Francisco districts at 11th iteration

districts at 13th iteration

San Francisco districts at 13th iteration

districts at 14th iteration

San Francisco districts at 14th iteration

districts at 15th (penultimate) iteration

San Francisco districts at 15th iteration

Top comments (0)