This tutorial explores how to construct and analyze p-adic structures using prefix trees (tries) in Clojure. We will generalize binary Morton codes to other prime bases (like p=3, p=5) to understand p-adic norms and their applications in data analysis.
πThis article is a supplement to part 1.
1. π The Core Idea: Prefix Chains
Any sequence of data, such as a Morton code or spatial coordinates, can be broken down into a chain of its prefixes. This forms a natural hierarchy, where each step in the chain adds more specific information.
For a sequence [a, b, c, d]
, the prefix chain is:
[[a], [a, b], [a, b, c], [a, b, c, d]]
This structure is essentially a linked list or a simple trie, which is the foundation for our analysis. π§±
(defn build-prefix-chain
"Builds a list of all prefixes for a given sequence."
[sequence]
(map #(vec (take % sequence))
(range 1 (inc (count sequence)))))
;; Example π‘
(let [morton-code [1 0 2 1]]
(println "Sequence:" morton-code)
(println "Prefix Chain:" (build-prefix-chain morton-code)))
;; Output:
;; Sequence: [1 0 2 1]
;; Prefix Chain: ([1] [1 0] [1 0 2] [1 0 2 1])
2. π Decomposing Data: Two Perspectives
We can analyze the hierarchical data in our prefix chains in two ways, analogous to Jordan and Cartan decompositions in algebra.
π A. Jordan-like Decomposition (Breadth-First)
This approach processes prefixes level by level, from shortest to longest. It's useful for analyzing data at progressive scales of detail.
(defn jordan-decomposition
"Sorts prefixes by their length (breadth-first)."
[prefix-chains]
(sort-by count (distinct (apply concat prefix-chains))))
;; Example: Analyze all prefixes by depth level π
(let [sequences [[1 0 2] [1 0 1] [2 0]]
all-prefixes (mapcat build-prefix-chain sequences)
decomposed (jordan-decomposition all-prefixes)]
(clojure.pprint/pprint (group-by count decomposed)))
;; Output:
;; {1 #{[1] [2]},
;; 2 #{[1 0], [2 0]},
;; 3 #{[1 0 2], [1 0 1]}}
π― B. Cartan-like Decomposition (Depth-First)
This approach processes the deepest (most specific) prefixes first. It's useful for focusing on fine-grained local details before considering the broader structure.
(defn cartan-decomposition
"Sorts prefixes by length in descending order (depth-first)."
[prefix-chains]
(reverse (jordan-decomposition prefix-chains)))
;; Example: Focus on the most detailed prefixes first π
(let [sequences [[1 0 2] [1 0 1] [2 0]]
all-prefixes (mapcat build-prefix-chain sequences)
decomposed (cartan-decomposition all-prefixes)]
(println decomposed))
;; Output:
;; ([1 0 2] [1 0 1] [1 0] [2 0] [1] [2])
3. π P-adic Norms and Ultrametric Distance
The prefix structure directly leads to the concept of p-adic norms and ultrametric distance. The distance between two sequences is determined by the length of their longest common prefix.
If two sequences A
and B
share a prefix of length k
, their ultrametric distance is p^(-k)
, where p
is the base of the digits (e.g., p=2 for binary, p=3 for ternary). The longer the shared prefix, the closer they are. π―
(defn get-common-prefix-length
"Finds the length of the common prefix between two sequences."
[seq-a seq-b]
(count (take-while true? (map = seq-a seq-b))))
(defn p-adic-distance
"Calculates the p-adic distance between two sequences for a given base p."
[p seq-a seq-b]
(let [k (get-common-prefix-length seq-a seq-b)]
(Math/pow p (- k))))
;; Example with p=3 β‘
(let [p 3
a [1 2 0 1]
b [1 2 0 2]
c [1 2 1 0]]
(println (str "Common prefix length (a, b): " (get-common-prefix-length a b)))
(println (str "Distance(a, b): " (p-adic-distance p a b))) ; should be 3^-3 = 0.037
(println (str "Common prefix length (a, c): " (get-common-prefix-length a c)))
(println (str "Distance(a, c): " (p-adic-distance p a c)))) ; should be 3^-2 = 0.111
This distance function satisfies the strong triangle inequality, d(x, z) <= max(d(x, y), d(y, z))
, which is the defining property of an ultrametric space. β¨
4. π Case Study: Clustering with Ternary (p=3) Morton Codes
Let's apply these ideas to cluster spatial data. Instead of using standard binary (p=2) Morton codes, we can use a ternary (p=3) system. This creates a different hierarchical grouping of the data.
The goal is to convert 3D coordinates into a 1D ternary Morton code, which preserves spatial locality. Then, we can use our p-adic distance to find clusters. πΊοΈ
;; A simplified function to interleave digits for a p-adic Morton code π’
(defn to-base-p [p precision n]
(loop [num n
result ()]
(if (or (zero? num) (= (count result) precision))
(take precision (concat (repeat (- precision (count result)) 0) result))
(recur (quot num p) (cons (rem num p) result)))))
(defn p-ary-morton-3d [x y z p precision]
(let [x' (to-base-p p precision x)
y' (to-base-p p precision y)
z' (to-base-p p precision z)]
(vec (interleave x' y' z'))))
;; Example: Use p=3 for clustering earthquake data π
(let [p 3
precision 4
;; Mock earthquake data (normalized coordinates)
earthquakes [[10 12 5] [11 13 5] [25 26 20]]
;; Generate ternary Morton codes
morton-codes (map #(p-ary-morton-3d (nth % 0) (nth % 1) (nth % 2) p precision) earthquakes)
[morton-a morton-b morton-c] morton-codes]
(println "Earthquake A Morton:" morton-a)
(println "Earthquake B Morton:" morton-b)
(println "Earthquake C Morton:" morton-c)
;; The first two earthquakes are spatially close π
;; Their Morton codes will share a longer prefix.
(println "\nDistance A-B:" (p-adic-distance p morton-a morton-b))
(println "Distance A-C:" (p-adic-distance p morton-a morton-c)))
;; By sorting data based on these Morton codes, we achieve
;; a spatially coherent ordering that can be used for efficient
;; clustering, neighbor searches, and indexing. π
π Conclusion
By viewing data sequences as prefix trees, we have built a practical foundation for understanding p-adic numbers and ultrametric spaces. This tutorial shows that we can go beyond binary systems to construct p-adic fields for any prime p, using it to decompose and analyze data in a hierarchical way.
This approach connects computational geometry with number theory, offering a powerful framework for spatial analysis in Clojure. π
Top comments (0)