Toki Hirose

Posted on Apr 16 • Edited on Apr 25

5. Comparing Pedestrian Manifolds with OSMnx Spatial Manifolds

#python #datascience #analytics #machinelearning

Note:I use AI assistance to draft and polish the English, but the analysis, interpretation, and core ideas are my own. Learning to write technical English is itself part of this project.

Introduction

In previous articles, I detected pedestrian trajectories from video at multiple observation points and extracted features — speed, acceleration, speed skewness, stop ratio, path straightness, and dwell time — to construct statistical distributions. I then converted each station's distributions to natural parameters θ and their duals, the expectation parameters η, forming the statistical manifold M_U. This gave manifold coordinates for each observation point (Shinjuku, Shinbashi, Ginza1, Ginza2, Kamata), and the KL divergence between their e-geodesics and m-geodesics revealed the curvature of M_U.

In this article, I construct a second manifold — M_C — from the street network geometry surrounding each intersection. Where M_U represents the manifold of pedestrian behavior, M_C represents the spatial geometric structure through which pedestrians move. The main analytical question is: what kind of discrepancy arises between these two manifolds?

The Stack as Adjoint Functors

The hypothesis in this article is grounded in Bratton's The Stack: treating the city as a computational platform, does the urban structure ground the dynamic patterns of its users? Specifically, does the edge length, network centrality, and node count of an area determine pedestrian behavior — how fast people walk, their acceleration, their stop ratio?

In information-geometric terms:

M_C is the manifold constructed from the distributional geometry of the street network within 420m (5-minute walk) of each observation point — distributions of edge lengths, circuity, node connectivity, betweenness centrality, and bearing
M_U is the statistical manifold from Article 4, where each station is a point defined by the fitted trajectory feature distributions

The verification method is to check whether the positions of each observation point on M_C and M_U align in their pairwise distance ordering after normalization. If they do, the User Layer is directly shaped by the City Layer. If not, either M_C is under-expressive, or the user behavior reflects influences from layers beyond the City Layer — the Interface Layer or Address Layer in Bratton's terms. To investigate this variation, a Pythagorean decomposition through the Interface Layer intervention is also planned.

The adjoint functor hypothesis: F: M_C → M_U (free functor, predictive) and G: M_U → M_C (forgetful functor, projection). An adjoint relationship means M_C and M_U are not independent — changes in one systematically correspond to changes in the other. Where the correspondence breaks down is precisely where something beyond the City Layer is at work.

Constructing M_C: Street Network Features via OSMnx

Feature Extraction

For each station, I extract the street network within a 420m walking-distance radius (5-minute walk) using OSMnx. Each network edge (street segment) is treated as one observation, giving a distribution of network geometry at each station — parallel to how M_U was built from pedestrian trajectories.

Five features are extracted at the edge/node level:

Feature	Distribution	Parameters
`edge_length`	Log-normal	(s, loc=0, scale)
`circuity_mapped`	Beta	(α, β)
`node_degree`	Gamma	(a, loc=0, scale)
`betweenness_centrality`	Beta	(α, β)
`bearing_rad`	Von-Mises	(κ, μ, scale=1)

In addition, one scalar network-level statistic is retained as a fixed coordinate (not a fitted distribution):

Feature	Meaning
`edge_density_km`	Street km per km² — network density

The circuity of an edge is the ratio of actual edge length to straight-line distance between its endpoints. A value of 1.0 means perfectly straight; higher values indicate detours. This captures how convoluted the street geometry is, independent of segment length. The bearing_rad encodes the dominant orientation of each street segment, fitted to a Von-Mises distribution to capture the angular structure of the street grid.

One preprocessing step is required for bearing_rad. OSMnx represents streets as a directed graph, so an east–west road appears as two edges at approximately 90° and 270°. If all edge bearings are used directly in the range [−π, π], they cancel out and the distribution collapses. To avoid this, a double-angle trick is applied: bearings are first mapped to [0°, 180°) via bearing % 180 to treat the street as undirected, then doubled to [0°, 360°) to restore the circular structure before fitting the Von-Mises distribution.

The dual coordinate conversion follows the same procedure as M_U:

Distribution	θ	η
Log-normal(s, scale)	(m/s², −1/2s²) where m=log(scale)	(m, m²+s²)
Beta(α, β)	(α−1, β−1)	(ψ(α)−ψ(α+β), ψ(β)−ψ(α+β))
Gamma(a, scale)	(a−1, −1/scale)	(ψ(a)+log(scale), a·scale)
Von-Mises(κ, μ)	(κ cos μ, κ sin μ)	(A(κ) cos μ, A(κ) sin μ) where A(κ)=I₁(κ)/I₀(κ)

Total θ/η dimension: 2 (Log-normal) + 2 (Beta) + 2 (Gamma) + 2 (Beta) + 2 (Von-Mises) = 10 dimensions

Plus 1 scalar coordinate: edge_density_km → 11-dimensional M_C point

The manifold point for station s on M_C is:

c_s = (s_len, scale_len, α_circ, β_circ, a_deg, scale_deg, α_btwn, β_btwn, κ·cosμ, κ·sinμ, ρ_edge) ∈ M_C

Scalar features have no distributional interpretation and are excluded from the dually flat structure.

M_C Coordinates

θ-space coordinates (10 distributional dims + 1 scalar):

station	len:m/s²	len:−1/2s²	circ:α−1	circ:β−1	deg:a−1	deg:−1/sc	btwn:α−1	btwn:β−1	bear:κcosμ	bear:κsinμ	edge_density_km
Ginza1	2.1281	−0.4001	−0.7925	1.4501	8.6317	−1.5537	−0.1772	40.8849	−0.0623	−0.0035	177.89
Ginza2	2.1385	−0.4016	−0.7925	1.4401	8.3871	−1.5175	−0.1789	40.9422	−0.0615	−0.0032	178.12
Shinjuku	2.5291	−0.4738	−0.7959	1.3335	4.8897	−1.0181	−0.3197	43.3613	0.1130	0.0216	196.20
Kamata	2.9538	−0.4843	−0.7795	0.9660	5.3845	−1.0664	−0.3520	21.5502	−0.0228	−0.0567	109.69
Shinbashi	2.5988	−0.4740	−0.7842	1.3007	5.9866	−1.1638	−0.3110	30.6136	0.0119	0.0035	157.19

M_C Geometry: Geodesics and Curvature

KL divergence matrix (M_C, asymmetric):

	Ginza1	Ginza2	Shinjuku	Kamata	Shinbashi
Ginza1	0.000	0.000	0.110	0.198	0.054
Ginza2	0.000	0.000	0.103	0.192	0.050
Shinjuku	0.149	0.137	0.000	0.193	0.055
Kamata	0.232	0.224	0.262	0.000	0.084
Shinbashi	0.064	0.058	0.066	0.077	0.000

The dual flat structure is the same as M_U. M_C has 10-dimensional θ/η space (distributional features only):

Log-normal(edge_length): 2 dims
Beta(circuity_mapped): 2 dims
Gamma(node_degree): 2 dims
Beta(betweenness_centrality): 2 dims
Von-Mises(bearing_rad): 2 dims

edge_density_km (1 scalar) is not part of the exponential family structure and is excluded from the geodesic/curvature analysis.

e-geodesic: straight line in θ-space, θ(t) = (1−t)·θ_A + t·θ_B

m-geodesic: straight line in η-space, η(t) = (1−t)·η_A + t·η_B

The KL divergence between e-geodesic and m-geodesic at each t measures the curvature of M_C. The pedestrian manifold M_U had a geodesic gap of approximately 0.047; the spatial manifold M_C is considerably flatter — consistent with the expectation that street network geometry is more uniform across stations than human behavior.

PCA and KL Divergence

The figure shows two aspects of M_C: a PCA projection of the 11-dimensional manifold coordinates, and a graph where edge thickness encodes symmetric KL divergence — thinner edges indicate more similar network geometry.

Several features are immediately visible. Ginza1 and Ginza2 occupy nearly the same position in both views, consistent with their physical proximity: they are adjacent observation points on the same street network. Shinbashi sits near the center of both representations, close in network geometry to all other stations. Kamata is clearly separated, reflecting its lower density and less-connected street structure.

At first glance, the PCA and KL divergence orderings look consistent. However, a difference appears when examining the Ginza–Shinjuku pair. In the PCA projection, these two stations appear fairly distant. But their symmetric KL divergence is not large relative to other pairs — and KL divergence is the geometrically meaningful distance on the information manifold. This illustrates a general limitation: PCA collapses the high-dimensional manifold structure into two dimensions and can distort relative distances. The KL divergence, computed directly in θ-space, preserves the information-geometric distances that PCA may misrepresent.

M_C vs M_U: Pairwise Distance Comparison

To compare M_C and M_U, I compute the symmetric KL divergence between all station pairs in each manifold, then define the adjoint gap per pair:

gap(A, B) = Sym-KL_U(A,B)_norm − Sym-KL_C(A,B)_norm

where both distances are normalized to [0,1] within their respective manifolds. A positive gap means the pair is behaviorally more distinct than spatial structure predicts; a negative gap means the street geometry separates them more than their pedestrian behavior does.

Pair	Sym-KL (M_C)	Sym-KL (M_U)	M_C (norm)	M_U (norm)	adj_gap
Kamata vs Shinbashi	0.1632	2.5508	0.354	1.000	+0.646
Shinjuku vs Shinbashi	0.1241	1.2454	0.269	0.412	+0.143
Ginza1 vs Shinbashi	0.1197	1.1134	0.259	0.352	+0.093
Ginza2 vs Shinbashi	0.1088	1.0101	0.236	0.306	+0.070
Ginza1 vs Ginza2	0.0004	0.3523	0.000	0.009	+0.009
Ginza2 vs Shinjuku	0.2478	0.7865	0.538	0.205	−0.333
Ginza1 vs Shinjuku	0.2666	0.3834	0.579	0.023	−0.556
Ginza2 vs Kamata	0.4167	0.7804	0.905	0.202	−0.703
Shinjuku vs Kamata	0.4605	0.8774	1.000	0.246	−0.754
Ginza1 vs Kamata	0.4311	0.3320	0.936	0.000	−0.936

If the adjoint hypothesis held, the M_C and M_U orderings would correlate. The Spearman rank correlation between the two Sym-KL sequences is ρ = −0.297 (p = 0.405) — not only non-significant, but negative. Pairs that are more spatially distant in M_C tend to be less behaviorally distinct in M_U, not more. This is not a null result; it is a directional finding: spatial separation and behavioral separation are misaligned, and misaligned in a consistent direction across the five stations.

Where the Correspondence Holds — and Where It Breaks

What M_C tells us

On the spatial manifold M_C, Kamata is the most isolated station: its normalized M_C distance to other stations spans 0.354 (vs Shinbashi) to 1.000 (vs Shinjuku). This reflects a structural property of the observation point: unlike the other stations, where the street network crosses the railway easily on multiple sides, the Kamata observation point requires walking approximately 200–300 meters along the tracks before reaching a grade crossing. As a result, the local network is less centrally connected — lower betweenness centrality and lower edge density — than the other stations, all of which are in more uniformly accessible urban cores.

Ginza1 and Ginza2 are the closest pair on M_C (Sym-KL ≈ 0.0004, norm = 0.000). As adjacent observation points on the same street grid, this is expected and serves as a consistency check: the feature extraction and fitting procedure correctly identifies near-identical network environments as near-identical manifold points.

What M_U tells us — and where M_C fails to predict it

On the behavioral manifold M_U, the most isolated station is Shinbashi, not Kamata. The Kamata–Shinbashi pair has the largest symmetric KL divergence in M_U (2.5508, norm = 1.000), and every pair involving Shinbashi sits near the top of the M_U distance ranking. Spatially, Shinbashi is a moderate-density area, geometrically similar to Shinjuku and Ginza — yet behaviorally its pedestrian distributions are the most distinct of all five stations.

Kamata's position on M_U is equally surprising. The Ginza1–Kamata pair has the smallest symmetric KL divergence on M_U (0.3320, norm = 0.000) — meaning that despite being the most spatially isolated station in M_C, Kamata's pedestrian behavior is closest to Ginza1's. The City Layer predicts Kamata to be an outlier; the User Layer places it near the center of the behavioral space.

Ginza1 and Ginza2 are also close in M_U (Sym-KL = 0.3523, norm = 0.009), as expected from their spatial proximity. However, they are not the closest pair in M_U — that is Ginza1 vs Kamata — so spatial adjacency does not fully determine behavioral similarity even for near-identical network environments.

Interpretation

These divergences identify where the adjoint hypothesis breaks down. A large positive adjoint gap (M_U norm − M_C norm >> 0) means the pair is behaviorally more distinct than spatial structure predicts — the City Layer under-explains the difference. The top cases are Kamata–Shinbashi (+0.646), Shinjuku–Shinbashi (+0.143), and the Ginza–Shinbashi pairs. In each case, the street network geometry does not account for the behavioral separation.

Negative gaps (Ginza–Kamata: −0.936; Shinjuku–Kamata: −0.754) mean the City Layer over-predicts the behavioral difference: stations that look geometrically distinct turn out to have similar pedestrian distributions. This suggests that some behavioral features are insensitive to the specific street network differences between central Tokyo and lower-density areas, or that common attractor effects — major transit hubs, commercial flows — produce convergent behavior across dissimilar spatial environments.

Taken together, both directions of mismatch point to the same conclusion: the City Layer alone cannot account for how pedestrians actually move. Physical space sets constraints, but behavior emerges from the full stack of layers that operate within it. In Bratton's framework, the Interface Layer — the signs, messages, and social signals that users encounter — and the Address Layer — the identities and affordances assigned to locations — each contribute independently to behavioral outcomes. The adjoint gap is a geometric measure of how much explanatory work those layers are doing, above and beyond what the street network provides.

One limitation specific to this dataset must be acknowledged here. All M_U measurements in this study were collected during solo standing demonstrations. This means the behavioral distributions captured on M_U already reflect the presence of an Interface Layer perturbation. To isolate the pure City Layer baseline — the M_U point a station would occupy without any intervention — measurements under neutral conditions are needed. Obtaining that baseline is a prerequisite for Series 2: it is the reference point p from which the intervention-induced shift will be measured. The data collection for this is ongoing.

Discussion: Structural Untranslatability

In this article, M_C and M_U have been compared through normalized pairwise distances alone. The two manifolds live in different spaces — M_C is defined over street network feature distributions, M_U over pedestrian trajectory distributions — and no formal mapping between them has been constructed here. The adjoint gap captures the discrepancy between their distance structures, but it does not yet decompose that discrepancy into interpretable geometric terms.

That decomposition is the task of Series 2. There, for each station, a correspondence point q on M_U will be defined as the "spatially predicted" behavioral distribution — the M_U point that the City Layer, via the adjoint functor F: M_C → M_U, would predict from M_C alone. Once that correspondence is established, the Pythagorean identity on the dually flat manifold M_U gives:

D(p ‖ r) = D(p ‖ q) + D(q ‖ r) + ⟨θ_q − θ_r, η_p − η_q⟩

where p is the observed baseline distribution, r is the distribution under intervention, and q is the spatially predicted point. The inner product term vanishes when the path p → q → r is m-orthogonal — that is, when the intervention effect lies entirely within the submanifold predicted by M_C. When it does not vanish, the residual is the component of the behavioral shift that the City Layer cannot account for — the measurable footprint of the Interface Layer.

What this article establishes is the prerequisite: that the adjoint gap is nonzero and structured in a non-trivial way. If M_C perfectly predicted M_U, there would be nothing left for the Interface Layer to explain. The discrepancies documented here — Shinbashi's behavioral isolation, Kamata's unexpected proximity to Ginza in M_U — are precisely the locations where the Series 2 decomposition will be most informative.

The adjoint functor F: M_C → M_U, when eventually operationalized in Series 2, is more naturally interpreted as an m-projection — M_C constrains the expectation parameters η of M_U (observable averages such as mean speed and stop ratio) rather than the natural parameters θ. The choice of projection type will determine how the correspondence point q is computed and, consequently, how the Interface Layer residual is decomposed.

Conclusion

By comparing M_C and M_U, we can see where the correspondence between urban spatial structure and pedestrian behavior holds — and where it breaks down. The dual flat structure of both manifolds makes this comparison geometrically tractable: distances, projections, and residuals all have precise meanings in terms of statistical distinguishability.

The next steps are twofold. First, more observation points will be added to strengthen the comparison. Second, measurements will be conducted both with and without the standing demonstration message, making it possible to observe how the presence of an intervention shifts each station's position on M_U. Since all data in this study was collected while a demonstration was in progress, the baseline — pedestrian behavior without any intervention — has not yet been captured.

So far, this series has examined only the relationship between the City Layer and the User Layer. The next step is to investigate the influence of the Interface Layer directly. The Address Layer is also a candidate influence, but how to operationalize and measure it remains an open question.

Citation

Boeing, G. (2025). Modeling and Analyzing Urban Networks and Amenities with OSMnx. Geographical Analysis 57 (4), 567-577. doi:10.1111/gean.70009
Bratton, B. H. (2015). The Stack: On Software and Sovereignty. MIT Press. ISBN: 9780262029575

DEV Community