DEV Community

Cover image for 🌌 High-Performance 3D Spatial Data Sorting with Morton Codes in Clojure.
Yoshihiro Hasegawa
Yoshihiro Hasegawa

Posted on • Edited on

🌌 High-Performance 3D Spatial Data Sorting with Morton Codes in Clojure.

This is Part 2 of our ultrametric series!
👈 Part 1: Building an Ultra-metric Tree in Clojure: From Radix Filters to p-adic Distance

In our previous article, we explored ultrametric spaces and p-adic distances. Now, let's take that foundation and apply it to a real-world spatial problem: efficiently sorting 3D data while preserving spatial locality.

Today, I'll show you how to combine Morton Codes (Z-order curves) with the ultrametric bucket sorting techniques we developed previously, creating a blazingly fast spatial sorting algorithm that keeps nearby points close together in the sorted order.

🎯 What Are We Building?

Let's solve this common problem:

;; We have scattered 3D points...
(def points [(Point3D 10 10 10)
             (Point3D 80 80 80)  
             (Point3D 11 10 10)  ; Close to the first point!
             (Point3D 81 82 80)]) ; Close to the second point!

;; Regular sorting destroys spatial relationships 😢
;; But with Morton Codes...

;; After sorting: nearby points are adjacent! 🎉
;; [(Point3D 10 10 10) (Point3D 11 10 10) (Point3D 80 80 80) (Point3D 81 82 80)]
Enter fullscreen mode Exit fullscreen mode

🔗 Building on Ultrametric Foundations

In Part 1, we learned about ultrametric spaces where the strong triangle inequality holds:

d(x,z) ≤ max(d(x,y), d(y,z))
Enter fullscreen mode Exit fullscreen mode

This property is perfect for hierarchical data structures. Now we'll leverage this for 3D spatial sorting using Morton Codes!

🧮 What's a Morton Code?

Building on our ultrametric understanding from Part 1, a Morton Code maps multi-dimensional coordinates to a single integer while preserving spatial locality:

3D coordinate (x, y, z) → Single integer value
✨ Spatial neighbors remain numeric neighbors!
Enter fullscreen mode Exit fullscreen mode

How It Works (Bit Interleaving)

;; Example: Calculate Morton Code for (5, 3, 2)

;; 1. Convert each coordinate to binary
x = 5 = 101
y = 3 = 011  
z = 2 = 010

;; 2. Interleave bits (z,y,x order)
;; x: _ _ 1 _ _ 0 _ _ 1
;; y: _ 0 _ 1 _ _ 1 _ _  
;; z: 0 _ _ 1 _ _ 0 _ _

;; 3. Combine them
;; Result: 001101001₂ = 105₁₀
Enter fullscreen mode Exit fullscreen mode

🚀 Clojure Implementation: Step by Step

Step 1: Basic Morton Code Generation

(defrecord Point3D [x y z])

(defn- spread-bits
  "Spread bits of a number by inserting two zeros between each bit"
  [n]
  (loop [result 0, i 0, val n]
    (if (zero? val)
      result
      (let [bit (bit-and val 1)]
        (recur (bit-or result (bit-shift-left bit (* i 3)))
               (inc i)
               (bit-shift-right val 1))))))

(defn point->morton-code
  "Convert 3D point to Morton code"
  [point]
  (let [x' (spread-bits (:x point))
        y' (spread-bits (:y point))
        z' (spread-bits (:z point))]
    ;; Combine in z,y,x order
    (bit-or x' (bit-shift-left y' 1) (bit-shift-left z' 2))))
Enter fullscreen mode Exit fullscreen mode

This bit-spreading technique is the heart of Morton encoding. By interleaving the bits of our x, y, and z coordinates, we create a single number that preserves spatial relationships. Points that are close in 3D space will have similar Morton codes, making them appear close together when sorted numerically.

The magic happens in the spread-bits function - it takes a coordinate like 5 (binary 101) and transforms it into 001000001 by inserting two zeros between each original bit. When we combine the spread bits from all three coordinates, we get a Morton code that acts like a "spatial address" for our point.

Step 2: Ultrametric Bucket Sort (From Part 1)

Remember our ultrametric bucket sorting from the previous article? Let's adapt it for spatial data with hierarchical bucket partitioning:

(defn- sort-level
  "Recursive hierarchical sort: group by each byte position"
  [arrays depth]
  (if (<= (count arrays) 1)
    arrays
    (let [;; Group by byte value at 'depth' position
          groups (group-by #(let [b (get % depth)]
                              (when (some? b) (bit-and 0xFF b)))
                           arrays)
          ;; Shorter arrays come first
          shorter-arrays (get groups nil [])
          ;; Sort groups by byte value
          sorted-groups (->> (dissoc groups nil) (sort-by key))]
      ;; Recursively process each group
      (concat shorter-arrays
              (mapcat (fn [[_ group-arrays]]
                        (sort-level group-arrays (inc depth)))
                      sorted-groups)))))

(defn ultrametric-sort
  "Ultrametric bucket sort algorithm"
  [byte-arrays]
  (sort-level byte-arrays 0))
Enter fullscreen mode Exit fullscreen mode

This recursive sorting approach leverages the ultrametric property we discussed in Part 1. Each level of recursion examines one byte of our Morton codes, effectively partitioning our 3D space into smaller and smaller cubic regions.

The beauty of this approach is that it naturally creates a hierarchical spatial index. Points in the same "bucket" at each level are guaranteed to be spatially close, and the recursive structure means we're building an implicit octree (8-way spatial tree) as we sort.

Notice how we handle arrays shorter than the current depth - these represent points that have identical Morton codes up to that byte position, meaning they're in the same small spatial region.

Step 3: Complete 3D Spatial Sort

(defn long->byte-array
  "Convert Morton code to sortable byte array"
  [l]
  (let [buffer (java.nio.ByteBuffer/allocate 8)]
    (.putLong buffer l)
    (.array buffer)))

(defn sort-3d-points
  "Sort 3D points in Morton order"
  [points]
  (let [;; Calculate Morton code for each point
        morton-keys (map #(-> % point->morton-code long->byte-array) points)
        ;; Maintain key-to-point mapping
        key-point-map (zipmap morton-keys points)
        ;; Sort by Morton codes
        sorted-keys (ultrametric-sort morton-keys)]
    ;; Restore original points in sorted order
    (map key-point-map sorted-keys)))
Enter fullscreen mode Exit fullscreen mode

The long->byte-array conversion is crucial here because it allows us to treat our Morton codes as sortable byte sequences. Java's ByteBuffer ensures we get a consistent big-endian representation, which preserves the numerical ordering we need.

The key insight is that by converting our Morton codes to byte arrays and then applying our ultrametric bucket sort, we're essentially performing a radix sort on the spatial coordinates. This gives us O(n) average-case performance for spatially distributed data, which is significantly better than comparison-based sorts.

🎮 Let's See It in Action!

(defn demo-spatial-sort []
  (println "=== 3D Spatial Sorting Demo ===")

  ;; Test data: intentionally scrambled order
  (let [points [(->Point3D 80 80 80)   ; Far cluster
                (->Point3D 10 10 10)   ; Near cluster 1
                (->Point3D 81 82 80)   ; Far cluster (close to first)
                (->Point3D 11 10 10)   ; Near cluster 1
                (->Point3D 10 11 10)   ; Near cluster 1
                (->Point3D 50 50 50)]  ; Middle point

        sorted-points (sort-3d-points points)]

    (println "Before sorting:")
    (doseq [[i p] (map-indexed vector points)]
      (println (format "%d: %s" i p)))

    (println "\nAfter Morton-order sorting:")
    (doseq [[i p] (map-indexed vector sorted-points)]
      (println (format "%d: %s" i p)))

    (println "\n👀 Notice how nearby points are now adjacent!")))

;; Run it
(demo-spatial-sort)
Enter fullscreen mode Exit fullscreen mode

Sample Output:

Before sorting:
0: Point3D{x=80, y=80, z=80}
1: Point3D{x=10, y=10, z=10}
2: Point3D{x=81, y=82, z=80}
...

After Morton-order sorting:
0: Point3D{x=10, y=10, z=10}
1: Point3D{x=11, y=10, z=10}  ← Close!
2: Point3D{x=10, y=11, z=10}  ← Close!
3: Point3D{x=50, y=50, z=50}
4: Point3D{x=80, y=80, z=80}
5: Point3D{x=81, y=82, z=80}  ← Close!
Enter fullscreen mode Exit fullscreen mode

The transformation you see here isn't just cosmetic - it has profound implications for spatial algorithms. In the sorted order, we've created natural "neighborhoods" where scanning a small range of consecutive elements will give us spatially adjacent points.

This property is what makes Morton-ordered data so powerful for applications like collision detection in games, where you typically want to find all objects within a certain radius of a target point. Instead of checking every single object (O(n) operation), you can focus on a small consecutive range in the sorted array.

🔍 Efficient Nearest Neighbor Search

With Morton-ordered data, neighbor searches become lightning fast:

(defn euclidean-distance
  "Calculate distance between two 3D points"
  [p1 p2]
  (let [dx (- (:x p1) (:x p2))
        dy (- (:y p1) (:y p2))
        dz (- (:z p1) (:z p2))]
    (Math/sqrt (+ (* dx dx) (* dy dy) (* dz dz)))))

(defn find-nearby-points
  "Efficiently find points near a target location"
  [sorted-points target-point max-distance]
  ;; Morton order means nearby points are close in the list
  ;; Can be further optimized with binary search
  (filter #(< (euclidean-distance % target-point) max-distance)
          sorted-points))
Enter fullscreen mode Exit fullscreen mode

The efficiency gain from Morton ordering becomes even more apparent with nearest neighbor searches. In a randomly ordered list, finding nearby points requires examining the entire dataset. But with Morton ordering, nearby points cluster together in the sorted sequence.

For even better performance, you could implement a binary search to quickly locate the approximate position of your target point in the Morton-ordered array, then expand outward from that position. This can reduce search complexity from O(n) to O(log n + k), where k is the number of nearby points found.

🚀 Tail-Recursive Optimization (Big Data Ready)

For Clojure, let's make it stack-safe with tail recursion:

(defn- sort-level-optimized
  "Tail-recursive optimized version for large datasets"
  [arrays depth acc]
  (if (<= (count arrays) 1)
    (into acc arrays)
    (let [groups (group-by #(let [b (get % depth)]
                              (when (some? b) (bit-and 0xFF (int b))))
                           arrays)
          shorter-arrays (get groups nil [])
          sorted-groups (->> (dissoc groups nil) (sort-by key))
          new-acc (into acc shorter-arrays)]
      ;; Tail recursion to avoid stack overflow
      (loop [remaining-groups sorted-groups
             current-acc new-acc]
        (if (empty? remaining-groups)
          current-acc
          (let [[_ group-arrays] (first remaining-groups)
                sorted-group (sort-level-optimized group-arrays (inc depth) [])]
            (recur (rest remaining-groups)
                   (into current-acc sorted-group))))))))

(defn ultrametric-sort-optimized
  "Stack-safe version for large datasets"
  [byte-arrays]
  (sort-level-optimized byte-arrays 0 []))
Enter fullscreen mode Exit fullscreen mode

This optimization is particularly important for Clojure applications processing large spatial datasets. The tail-recursive approach ensures we don't hit stack overflow errors when dealing with deeply nested spatial hierarchies.

The accumulator pattern (acc) allows us to build up our result iteratively rather than through nested function calls. This is especially beneficial when processing point clouds from LiDAR scans, astronomical data, or other scientific datasets that can contain millions of spatial coordinates.

🎯 Real-World Applications

1. Game Development: Collision Detection

(defrecord GameObject [id position velocity])

(defn update-collision-system [game-objects]
  (let [sorted-positions (sort-3d-points (map :position game-objects))]
    ;; Check only nearby objects instead of all pairs
    ;; Reduces complexity from O(n²) to O(n log n)!
    (map #(find-nearby-points sorted-positions (:position %) collision-radius)
         game-objects)))
Enter fullscreen mode Exit fullscreen mode

These applications demonstrate the versatility of Morton-ordered spatial sorting. The key advantage across all these domains is cache locality - because spatially nearby points are stored consecutively in memory, your CPU cache becomes much more effective.

In game development, this translates to faster collision detection and spatial queries. In GIS applications, it means more efficient map rendering and spatial database queries. For scientific computing, it enables better performance in particle simulations and mesh processing algorithms.

2. Geographic Information Systems (GIS)

(def tokyo-landmarks
  [(->Point3D 139.691706 35.689487 0)  ; Tokyo Station
   (->Point3D 139.700272 35.658034 0)  ; Tokyo Tower  
   (->Point3D 139.796230 35.712776 0)  ; Tokyo Skytree
   (->Point3D 139.745433 35.658031 0)]) ; Roppongi Hills

(defn find-nearby-landmarks [user-location radius-km]
  (let [sorted-landmarks (sort-3d-points tokyo-landmarks)]
    (find-nearby-points sorted-landmarks user-location radius-km)))
Enter fullscreen mode Exit fullscreen mode

3. Scientific Computing: Spatial Clustering

(defn spatial-clustering [data-points cluster-radius]
  "Group nearby experimental data points"
  (let [sorted-data (sort-3d-points data-points)]
    (map #(find-nearby-points sorted-data % cluster-radius)
         data-points)))
Enter fullscreen mode Exit fullscreen mode

🔥 Why This Approach Rocks

1. Spatial Locality Preservation

  • 3D neighbors stay close after sorting
  • Perfect for database page locality optimization

2. Efficient Range Queries

  • Morton order enables binary search
  • Simpler than R-trees, similar performance

3. Parallelization-Friendly

  • Each level can be processed independently
  • Easy to scale with ForkJoinPool

4. Cache-Efficient

  • Sequential memory access patterns
  • CPU cache-friendly algorithms

🤔 Limitations & Solutions

Limitation 1: Curse of Dimensionality

High dimensions (>10D) reduce effectiveness
→ Solution: Use PCA for dimensionality reduction first

Limitation 2: Data Distribution Sensitivity

Extremely skewed data hurts performance
→ Solution: Adaptive hierarchical partitioning

Limitation 3: Integer Coordinates Only

Morton codes work best with integer coordinates
→ Solution: Scale floating-point values appropriately

⚡ Performance Optimization Tips

Parallel Processing Version

(defn ultrametric-sort-parallel
  "Parallel version for large datasets"
  ([byte-arrays] (ultrametric-sort-parallel byte-arrays 1000))
  ([byte-arrays threshold]
   (if (<= (count byte-arrays) threshold)
     (ultrametric-sort byte-arrays)
     ;; Use ForkJoinPool for large datasets
     (let [groups (group-by first-byte byte-arrays)
           futures (map #(.submit ForkJoinPool/commonPool 
                                  (fn [] (ultrametric-sort %)))
                       (vals groups))]
       (mapcat #(.get %) futures)))))
Enter fullscreen mode Exit fullscreen mode

Magic Numbers for Bit Spreading (Advanced)

(defn- spread-bits-fast
  "Optimized bit spreading using magic numbers"
  [n]
  (let [n (bit-and n 0x3ff)] ; Limit to 10 bits
    (-> n
        (bit-and 0x000003ff)
        (bit-or (bit-shift-left (bit-and n 0x000ffc00) 2))
        (bit-and 0x0300f00f)
        (bit-or (bit-shift-left (bit-and n 0x00000300) 2))
        (bit-and 0x030c30c3)
        (bit-or (bit-shift-left (bit-and n 0x00000030) 2))
        (bit-and 0x09249249))))
Enter fullscreen mode Exit fullscreen mode

🎉 Conclusion

Building on our ultrametric foundations from Part 1, Morton Code spatial sorting in Clojure offers:

  • Relatively simple implementation
  • Excellent spatial locality preservation
  • Natural parallelization opportunities
  • Wide range of practical applications

This technique shines in 3D games, GIS systems, scientific simulations, and any domain where spatial relationships matter.

The functional programming paradigms of Clojure (immutability, higher-order functions, lazy evaluation) pair beautifully with algorithmic spatial processing!


🚀 What's Next?

In my next post, I'll benchmark this implementation against:

  • Standard sorting algorithms
  • R-tree spatial indices
  • Parallel processing variations

Full source code is available on GitHub!


Have you worked with spatial data algorithms? What challenges have you faced? Drop a comment below - I'd love to hear your experiences!

Follow me for more functional programming and algorithm deep-dives! 🚀

Buy me a coffee if this helped! ☕

Top comments (0)