🌌A Tutorial on P-adic Structures with Clojure.

#clojure #algorithms #datastructures #bigdata

This tutorial explores how to construct and analyze p-adic structures using prefix trees (tries) in Clojure. We will generalize binary Morton codes to other prime bases (like p=3, p=5) to understand p-adic norms and their applications in data analysis.

👈This article is a supplement to part 1.

1. 🔗 The Core Idea: Prefix Chains

Any sequence of data, such as a Morton code or spatial coordinates, can be broken down into a chain of its prefixes. This forms a natural hierarchy, where each step in the chain adds more specific information.

For a sequence [a, b, c, d], the prefix chain is:
[[a], [a, b], [a, b, c], [a, b, c, d]]

This structure is essentially a linked list or a simple trie, which is the foundation for our analysis. 🧱

(defn build-prefix-chain
  "Builds a list of all prefixes for a given sequence."
  [sequence]
  (map #(vec (take % sequence))
       (range 1 (inc (count sequence)))))

;; Example 💡
(let [morton-code [1 0 2 1]]
  (println "Sequence:" morton-code)
  (println "Prefix Chain:" (build-prefix-chain morton-code)))
;; Output:
;; Sequence: [1 0 2 1]
;; Prefix Chain: ([1] [1 0] [1 0 2] [1 0 2 1])

2. 🔀 Decomposing Data: Two Perspectives

We can analyze the hierarchical data in our prefix chains in two ways, analogous to Jordan and Cartan decompositions in algebra.

📊 A. Jordan-like Decomposition (Breadth-First)

This approach processes prefixes level by level, from shortest to longest. It's useful for analyzing data at progressive scales of detail.

(defn jordan-decomposition
  "Sorts prefixes by their length (breadth-first)."
  [prefix-chains]
  (sort-by count (distinct (apply concat prefix-chains))))

;; Example: Analyze all prefixes by depth level 📏
(let [sequences [[1 0 2] [1 0 1] [2 0]]
      all-prefixes (mapcat build-prefix-chain sequences)
      decomposed (jordan-decomposition all-prefixes)]
  (clojure.pprint/pprint (group-by count decomposed)))
;; Output:
;; {1 #{[1] [2]},
;;  2 #{[1 0], [2 0]},
;;  3 #{[1 0 2], [1 0 1]}}

🎯 B. Cartan-like Decomposition (Depth-First)

This approach processes the deepest (most specific) prefixes first. It's useful for focusing on fine-grained local details before considering the broader structure.

(defn cartan-decomposition
  "Sorts prefixes by length in descending order (depth-first)."
  [prefix-chains]
  (reverse (jordan-decomposition prefix-chains)))

;; Example: Focus on the most detailed prefixes first 🔍
(let [sequences [[1 0 2] [1 0 1] [2 0]]
      all-prefixes (mapcat build-prefix-chain sequences)
      decomposed (cartan-decomposition all-prefixes)]
  (println decomposed))
;; Output:
;; ([1 0 2] [1 0 1] [1 0] [2 0] [1] [2])

3. 📐 P-adic Norms and Ultrametric Distance

The prefix structure directly leads to the concept of p-adic norms and ultrametric distance. The distance between two sequences is determined by the length of their longest common prefix.

If two sequences A and B share a prefix of length k, their ultrametric distance is p^(-k), where p is the base of the digits (e.g., p=2 for binary, p=3 for ternary). The longer the shared prefix, the closer they are. 🎯

(defn get-common-prefix-length
  "Finds the length of the common prefix between two sequences."
  [seq-a seq-b]
  (count (take-while true? (map = seq-a seq-b))))

(defn p-adic-distance
  "Calculates the p-adic distance between two sequences for a given base p."
  [p seq-a seq-b]
  (let [k (get-common-prefix-length seq-a seq-b)]
    (Math/pow p (- k))))

;; Example with p=3 ⚡
(let [p 3
      a [1 2 0 1]
      b [1 2 0 2]
      c [1 2 1 0]]
  (println (str "Common prefix length (a, b): " (get-common-prefix-length a b)))
  (println (str "Distance(a, b): " (p-adic-distance p a b))) ; should be 3^-3 = 0.037
  (println (str "Common prefix length (a, c): " (get-common-prefix-length a c)))
  (println (str "Distance(a, c): " (p-adic-distance p a c)))) ; should be 3^-2 = 0.111

This distance function satisfies the strong triangle inequality, d(x, z) <= max(d(x, y), d(y, z)), which is the defining property of an ultrametric space. ✨

4. 🌍 Case Study: Clustering with Ternary (p=3) Morton Codes

Let's apply these ideas to cluster spatial data. Instead of using standard binary (p=2) Morton codes, we can use a ternary (p=3) system. This creates a different hierarchical grouping of the data.

The goal is to convert 3D coordinates into a 1D ternary Morton code, which preserves spatial locality. Then, we can use our p-adic distance to find clusters. 🗺️

;; A simplified function to interleave digits for a p-adic Morton code 🔢
(defn to-base-p [p precision n]
  (loop [num n
         result ()]
    (if (or (zero? num) (= (count result) precision))
      (take precision (concat (repeat (- precision (count result)) 0) result))
      (recur (quot num p) (cons (rem num p) result)))))

(defn p-ary-morton-3d [x y z p precision]
  (let [x' (to-base-p p precision x)
        y' (to-base-p p precision y)
        z' (to-base-p p precision z)]
    (vec (interleave x' y' z'))))

;; Example: Use p=3 for clustering earthquake data 🌋
(let [p 3
      precision 4
      ;; Mock earthquake data (normalized coordinates)
      earthquakes [[10 12 5] [11 13 5] [25 26 20]]

      ;; Generate ternary Morton codes
      morton-codes (map #(p-ary-morton-3d (nth % 0) (nth % 1) (nth % 2) p precision) earthquakes)
      [morton-a morton-b morton-c] morton-codes]

  (println "Earthquake A Morton:" morton-a)
  (println "Earthquake B Morton:" morton-b)
  (println "Earthquake C Morton:" morton-c)

  ;; The first two earthquakes are spatially close 📍
  ;; Their Morton codes will share a longer prefix.
  (println "\nDistance A-B:" (p-adic-distance p morton-a morton-b))
  (println "Distance A-C:" (p-adic-distance p morton-a morton-c)))

;; By sorting data based on these Morton codes, we achieve
;; a spatially coherent ordering that can be used for efficient
;; clustering, neighbor searches, and indexing. 🚀

🎉 Conclusion

By viewing data sequences as prefix trees, we have built a practical foundation for understanding p-adic numbers and ultrametric spaces. This tutorial shows that we can go beyond binary systems to construct p-adic fields for any prime p, using it to decompose and analyze data in a hierarchical way.

This approach connects computational geometry with number theory, offering a powerful framework for spatial analysis in Clojure. 💎

Buy me a coffee if this helped! ☕