Introduction
Lateral flattening is the process of converting nested or hierarchical JSON structures into flat, tabular formats. Python’s pandas library provides powerful tools for this task, particularly the json_normalize function. This article explores how to use Python to explode arrays and flatten nested JSON data for use in analytics, databases, or machine learning pipelines.
JSON’s nested structure is ideal for web APIs and configuration files but becomes cumbersome for:
- Relational databases (e.g., PostgreSQL, MySQL).
- Tabular analysis tools (e.g., pandas, Excel).
- Machine learning models (most require 2D input).
Flattening resolves nested keys and explodes arrays into rows or columns, enabling compatibility with these systems.
Python Implementation with pandas
Key Function: json_normalize
The pandas.json_normalize method recursively flattens nested JSON structures. Its parameters include:
-
data: The JSON input (dict or list of dicts). -
record_path: The key containing the array to explode. -
meta: Fields to preserve as metadata (e.g.,id,name). -
meta_prefix: A prefix for nested metadata keys (e.g.,user_).
Example 1: Exploding a Simple Array
Input JSON:
{
"id": 1,
"name": "John Doe",
"contactIds": [1, 2, 3, 4]
}
Python Code:
import pandas as pd
data = {
"id": 1,
"name": "John Doe",
"contactIds": [1, 2, 3, 4]
}
# Explode the "contactIds" array into rows
df = pd.json_normalize(
data,
record_path="contactIds", # Array to explode
meta=["id", "name"], # Fields to retain
record_prefix="contactId_" # Optional: prefix for exploded values
)
print(df)
Output:
id name contactId_
0 1 John Doe 1
1 1 John Doe 2
2 1 John Doe 3
3 1 John Doe 4
Example 2: Handling Nested Objects
For JSON with nested objects (e.g., address.street), json_normalize automatically concatenates keys:
Input JSON:
{
"id": 1,
"name": "John Doe",
"address": {
"street": "123 Main St",
"city": "Anytown"
}
}
Python Code:
df = pd.json_normalize(data)
print(df)
Output:
id name address.street address.city
0 1 John Doe 123 Main St Anytown
To rename columns:
df.columns = df.columns.str.replace(".", "_")
Example 3: Complex Nesting (Arrays of Objects)
For arrays containing nested objects, json_normalize combines key concatenation and array explosion:
Input JSON:
{
"id": 1,
"orders": [
{"item": "A", "price": 10},
{"item": "B", "price": 20}
]
}
Python Code:
df = pd.json_normalize(
data,
record_path="orders", # Explode the "orders" array
meta=["id"], # Keep "id" as metadata
meta_prefix="user_"
)
print(df)
Output:
item price user_id
0 A 10 1
1 B 20 1
Advanced Customization
Handling Missing Data
Use the errors parameter to ignore or raise errors for missing fields:
pd.json_normalize(data, errors="ignore") # Skip missing keys
Flattening Multiple Levels
For deeply nested JSON, combine json_normalize with recursive functions or custom logic.
Alternatives to pandas
-
flatdictLibrary: Lightweight flattening without dependencies. - Manual Recursion: Custom Python functions for edge cases.
When to Avoid Flattening
- Preserving Hierarchy: Nested JSON is more efficient for tree-like data (e.g., organizational charts).
- APIs: Clients often expect nested responses.
Footnotes
- Use
max_levelinjson_normalizeto control flattening depth (e.g.,max_level=2). - Flattened JSON may increase storage size due to duplicated metadata.
Top comments (0)