Toki Hirose

Posted on Apr 4 • Edited on Apr 25

4. Constructing a Station-Level Statistical Manifold with Dual Flat Structure from Pedestrian Trajectories

#analytics #datascience #learning

Note:I use AI assistance to draft and polish the English, but the analysis, interpretation, and core ideas are my own. Learning to write technical English is itself part of this project.

Introduction

In this article, I extend the pedestrian trajectory feature distributions measured in Article 3 to analyze pedestrian trajectory distributions across multiple urban locations. Rather than applying PCA, I construct a manifold where each location's distribution becomes a single coordinate point, and compute KL divergences based on dual flat structure for comparison between stations. By embedding each observation point's information onto the manifold, we can connect how differences in pedestrian behavioral dynamics are influenced by differences in urban spatial structure — a connection developed further in the next article.

The key idea is to treat individual pedestrian trajectory observations not as isolated events, but as distributions where each observation point becomes a single point on the information-geometric manifold. Comparing stations on geodesics with dual flat structure — separating natural parameters from expectation parameters of the pedestrian trajectory distributions — allows us to observe non-linear behavioral differences in a geometrically precise way.

Motivation

At the end of last year, I read Spivak's "Can the Subaltern Speak?" I felt that the method of searching for where silence lies is similar to acoustic reflection surveying in geophysics. Acoustic reflection surveying sends sound into the ground; where strata boundaries exist, reflection intensity changes. Graphing the vertical intensity changes reveals stratum surfaces as regions of high reflection. Where liquid exists underground, reflections become chaotic rather than coherent.

When ideology is treated as its environment, well-discussed places show clear strata surfaces, while places that go undiscussed become regions that cannot be measured by that observation method. Like acoustic reflection surveying, the purpose of this series is to capture reflection surfaces by applying some action and identify where invisible places are.

Around the same time, I encountered information geometry. KL divergence measures the asymmetric difference between two distributions. I thought that invisible differences between places — the kind that symmetric metrics erase — might emerge precisely in that asymmetry.

Methodological Foundation

Why Information Geometry Over PCA?

PCA's essential operation is dimensionality reduction — discarding data. It treats observations as points in Euclidean space and projects onto directions of maximum variance, focusing on the observations themselves and summarizing inter-indicator relationships linearly. Euclidean distance between parameters does not correspond to "statistical distinguishability."

Information geometry addresses this through its dual flat structure:

Distributions as manifold points: Entire distributions are the unit of analysis, not individual observations
Fisher metric: The unique invariant metric on statistical manifolds, measuring distance as statistical distinguishability — how easily two distributions can be separated from data
Dual structure (e-connection / m-connection): Naturally separates observational indicators from distributional parameters:
- e-connection: captures changes in natural parameters (generative mechanisms)
- m-connection: captures changes in expectation parameters (observable statistics)
- The same observed change may reflect different magnitudes of change in the generative mechanism depending on the location on the manifold — a non-symmetry PCA cannot capture in principle

Station as a Point on the Manifold

Data was collected at five locations: Ginza 1-chome, Ginza 2-chome, Shinjuku, Kamata, and Shinbashi.

point_name	tracks	lat	lon	point attribution
Ginza1	618	35.67380	139.76772	Shopping / Tourism
Ginza2	517	35.67385	139.76775	Shopping / Tourism
Shinjuku	989	35.69183	139.70259	Large-scale commercial / Transit hub
Kamata	745	35.56262	139.71545	Commercial / Transit hub
Shinbashi	1106	35.66575	139.75797	Business

The point attribution labels indicate the urban function and regional character of each location. "Shopping/Tourism" suggests pedestrian influx is primarily for sightseeing and shopping; "Business" indicates a high proportion of commuting and work-related use.

Shinjuku, as a major terminal station, is a complex point where commercial and transit functions overlap. Even sharing the "Shopping/Tourism" label with Ginza, Shinjuku is expected to show greater diversity in travel purpose and speed distribution, with more pronounced mixing of stop and through-traffic behavior. Shinjuku's manifold point likely reflects broader distributions and more mixed traffic behavior compared to Ginza. Each station's point on the manifold is determined by its trajectory feature distribution parameters; point attribution functions as an "urban context" tag attached to that point.

Note: Ginza 1 and 2 are at adjacent intersections and overlap at this map scale.

Each station is represented by fitting the same 8-feature distribution schema established in Article 3:

Feature	Distribution	Parameters
`real_speed_mean`	Normal	(μ, σ)
`real_speed_cv`	Log-normal	(s, loc, scale)
`real_accel_abs_mean`	Half-normal	(loc, σ)
`stop_ratio`	Beta	(α, β)
`real_straightness`	Beta	(α, β)
`speed_skew`	Gamma	(a, loc, scale)
`decel_ratio`	Beta	(α, β)
`duration_sec`	Log-normal	(s, loc, scale)

The manifold point for station s is the concatenated vector of all fitted parameters:

p_s = (μ_speed, σ_speed, s_cv, …, s_dur, scale_dur) ∈ M_U

As a concrete example, the manifold coordinates for Shinjuku are:

Feature	Distribution	Fitted Parameters
`real_speed_mean`	Normal	μ=1.4625, σ=0.5319
`real_speed_cv`	Log-normal	s=0.3845, loc=0, scale=0.8025
`real_accel_abs_mean`	Half-normal	loc=0, σ=28.2603
`stop_ratio`	Beta	α=0.7074, β=7.8578
`real_straightness`	Beta	α=0.8339, β=0.7307
`speed_skew`	Gamma	a=1.861, loc=0, scale=1.1513
`decel_ratio`	Beta	α=31.5501, β=31.492
`duration_sec`	Log-normal	s=0.8659, loc=0, scale=2.1172

This parameter vector defines Shinjuku's single point on M_U. The same procedure is applied to all five stations to populate the manifold.

Dual Flat Structure

M_U is a dually flat manifold because each feature distribution belongs to an exponential family. The product of independent exponential families inherits this structure with:

Natural parameters θ: Canonical exponential family parametrization
Expectation parameters η: E[T(X)] where T(X) are sufficient statistics
Related by Legendre transform: η = ∇_θ ψ(θ) where ψ is the log-partition function

For each distribution family:

Normal(μ,σ): θ = (μ/σ², −1/2σ²), η = (μ, μ²+σ²)
Log-normal(s,scale): θ = (m/s², −1/2s²) where m=log(scale), η = (m, m²+s²)
Half-normal(σ): θ = −1/2σ², η = σ²
Gamma(a,scale): θ = (a−1, −1/scale), η = (ψ(a)−log(1/scale), a·scale)
Beta(α,β): θ = (α−1, β−1), η = (ψ(α)−ψ(α+β), ψ(β)−ψ(α+β))

For example, applying real_speed_mean (Normal distribution) to the dual flat structure:

Natural parameters θ
- θ₁ = μ/σ²: the mean weighted by precision. If pedestrian speeds are highly variable (large σ²), this value is small — a weak signal. If everyone walks at nearly the same speed, it is large — a strong signal.
- θ₂ = −1/2σ²: encodes the precision of the distribution. A large spread yields a small (more negative) value; a broad distribution yields a value close to zero.
Expectation parameters η
- η₁ = μ: simply the mean walking speed — directly readable from the data.
- η₂ = μ² + σ²: the second moment, encoding both the mean and the spread of speeds.

Applying these conversions to Shinjuku's fitted parameters yields a 15-dimensional coordinate vector in each system (θ-dim = η-dim = 15, one dimension per sufficient statistic across all 8 features):

θ (Shinjuku): [ 5.1697e+00 -1.7674e+00 -1.4882e+00 -3.3827e+00 -6.0000e-04
               -2.9260e-01  6.8578e+00 -1.6610e-01 -2.6930e-01  8.6100e-01
               -8.6860e-01  3.0550e+01  3.0492e+01  1.0005e+00 -6.6690e-01]

η (Shinjuku): [ 1.4625e+00  2.4218e+00 -2.2000e-01  1.9620e-01  7.9864e+02
               -3.2875e+00 -9.1700e-02 -9.8470e-01 -1.2312e+00  4.6990e-01
                2.1425e+00 -7.0020e-01 -7.0210e-01  7.5010e-01  1.3124e+00]

These two vectors are the dual coordinates of Shinjuku's point on M_U. The θ-coordinates encode the generative mechanism (natural parameters), while the η-coordinates encode the observable statistics (expectation parameters). Their relationship via the Legendre transform is what makes geodesic and divergence calculations tractable.

Fisher Information Metric

The Fisher information matrix G for M_U is block diagonal due to feature independence:
G = diag(G₁, G₂, …, G₈)

Each block G_i is computed analytically for the corresponding distribution family.

Results

Manifold Construction

Applied the schema to 5 JRE Line stations with available trajectory data. Each station's manifold coordinates were computed by fitting distributions to trajectory features extracted as in Article 3.

KL Divergence Between Stations

Using the block diagonal structure, KL divergence between stations p and q is:
D_KL(p ‖ q) = Σᵢ D_KL(pᵢ ‖ qᵢ)

For example, for a Normal-distributed feature:

D_KL(𝒩(μ₁,σ₁) ‖ 𝒩(μ₂,σ₂)) = log(σ₂/σ₁) + (σ₁² + (μ₁−μ₂)²) / (2σ₂²) − 1/2

Each distribution family (Log-normal, Half-normal, Gamma, Beta) has its own closed-form expression, computed analytically in the same way.

The divergence matrix shows clear behavioral differences between stations, with some pairs showing much higher divergence than others.

	Ginza1	Ginza2	Shinjuku	Kamata	Shinbashi
Ginza1	0.000	0.200	0.175	0.165	0.665
Ginza2	0.152	0.000	0.319	0.341	0.521
Shinjuku	0.208	0.467	0.000	0.453	0.740
Kamata	0.167	0.439	0.424	0.000	1.566
Shinbashi	0.448	0.490	0.505	0.984	0.000

Note that D(p‖q) ≠ D(q‖p) — KL divergence is asymmetric. For example, D(Kamata‖Shinbashi) = 1.566 while D(Shinbashi‖Kamata) = 0.984. Shinbashi shows the highest divergence from all other stations, suggesting it occupies a behaviorally distinct region of M_U.

This figure shows how much KL divergence arises per feature for each station pair. Row 1: D(p‖q) — forward direction. Row 2: D(q‖p) — reverse direction. Row 3: D(p‖q) − D(q‖p) — asymmetry (with zero line). The third row in particular reveals which features are driving the KL asymmetry.

e-Geodesics and m-Geodesics

The dual flat structure enables two types of geodesics between stations:

e-geodesic: Straight line in θ-space (natural parameter space)
m-geodesic: Straight line in η-space (expectation parameter space)

The asymmetry between these paths reveals the curvature of the behavioral manifold. For pairs with high KL divergence, the midpoint of the e-geodesic and m-geodesic can differ significantly, indicating nonlinear relationships between generative mechanisms and observed statistics.

The figure below shows the e-geodesic (left) and m-geodesic (right) between two observation points, along with the KL divergence computed at intermediate points along each path (center). t=0 represents the start point and t=1 the end point. At t=0 and t=1 the KL divergence is zero by definition; it reaches its maximum at t=0.5, the midpoint.

Pair	Sym-KL	Max Div	Mean Div	Nonlinearity
Kamata↔Shinbashi	1.27542	0.04743	0.02420	High
Shinjuku↔Shinbashi	0.62270	0.01401	0.00695	Medium
Ginza1↔Shinbashi	0.55669	0.01352	0.00672	Medium
Ginza2↔Shinjuku	0.39324	0.00896	0.00446	Medium
Ginza2↔Shinbashi	0.50506	0.00890	0.00442	Medium
Ginza2↔Kamata	0.39019	0.00546	0.00269	Low
Shinjuku↔Kamata	0.43869	0.00372	0.00182	Low
Ginza1↔Ginza2	0.17615	0.00202	0.00099	Low
Ginza1↔Shinjuku	0.19169	0.00178	0.00087	Low
Ginza1↔Kamata	0.16602	0.00081	0.00040	Low

Max Div: max geodesic gap
Mean Div: mean geodesic gap

The maximum geodesic gap is 0.047, which is small — indicating that this manifold is relatively flat. The e-geodesic and m-geodesic paths can be considered approximately identical.

PCA Distance vs Sym-KL Divergence

Image 5 illustrates the discrepancy between PCA and KL divergence. Each point represents an observation station projected onto two PCA components from the 15-dimensional θ feature space. The edges between points encode Sym-KL divergence: thicker edges indicate greater distributional difference. Notably, pairs such as Ginza1 and Shinjuku appear far apart in PCA space despite having one of the smaller KL divergences — a clear demonstration of how the Euclidean and Fisher metrics can produce conflicting orderings.

Pair	PCA dist	Sym-KL	PCA dist (norm)	Sym-KL (norm)	diff (KL - PCA)
Kamata vs Shinbashi	7.272	1.275	0.899	1.000	0.101
Shinjuku vs Shinbashi	7.907	0.623	1.000	0.412	-0.588
Ginza1 vs Shinbashi	6.019	0.557	0.699	0.352	-0.347
Ginza2 vs Shinbashi	3.914	0.505	0.364	0.306	-0.059
Shinjuku vs Kamata	6.168	0.439	0.723	0.246	-0.477
Ginza2 vs Shinjuku	6.910	0.393	0.841	0.205	-0.636
Ginza2 vs Kamata	3.574	0.390	0.310	0.202	-0.108
Ginza1 vs Shinjuku	4.954	0.192	0.530	0.023	-0.507
Ginza1 vs Ginza2	2.811	0.176	0.189	0.009	-0.180
Ginza1 vs Kamata	1.624	0.166	0.000	0.000	0.000

Discussion

PCA Distance vs KL Divergence

Although PCA was used for visualization, the actual analysis uses 15-dimensional θ/η vectors. Comparing normalized PCA distance and Sym-KL on a 0–1 scale reveals different orderings — pairs involving Shinjuku show particularly large discrepancies. This arises from a fundamental difference in what each metric measures: PCA reduces to 2 components and computes Euclidean distance; KL divergence applies the Fisher metric across all 15 components. A large KL divergence means the two distributions are statistically distinguishable with fewer samples — it captures how different the distributions are, not just how far apart their parameter vectors sit in Euclidean space.

The manifold itself is relatively flat (maximum geodesic gap = 0.047), meaning e-geodesic and m-geodesic paths are nearly identical. The discrepancy with PCA is therefore not a curvature effect but a metric effect: information geometry preserves the statistical structure of distributions, while PCA discards it. The dual structure further separates generative parameters from observable statistics — a distinction PCA cannot make. Unlike regression approaches that treat residuals as noise, the adjoint functor framework (developed in Article 5) will interpret residuals as structural untranslatability between urban form and pedestrian behavior.

Asymmetry

D(Kamata‖Shinbashi) > D(Shinbashi‖Kamata) means there are behaviors among Kamata pedestrians that rarely occur at Shinbashi. Features that produce asymmetry are those where one station has behavior patterns that simply don't appear at the other. For example, if stop_ratio asymmetry is large, Kamata pedestrians have a broader (or narrower) tail in their stopping-rate distribution than Shinbashi pedestrians. This asymmetry can represent functionally meaningful urban differences — a property that symmetric distance measures like PCA cannot capture.

Conclusion

By constructing M_U as a statistical manifold, we can quantify pedestrian behavioral differences between stations with geometric precision. The dual flat structure characterizes the geometry of space and enables geometrically meaningful comparison of station distributions. This approach moves beyond simple correlation analysis to detect structural differences in how urban environments shape human movement patterns. The dual structure of M_U will also serve as the foundation for decomposing intervention effects in Series 2.

In the next article, I'll compare this pedestrian manifold with a spatial manifold constructed from OSMnx street network features, establishing the adjoint relationship between urban structure and pedestrian behavior.