Inspiration
I recently ran into a problem where I needed to process a JSON payload, extract a few fields, transform some values, compute a few derived ones, and produce a new JSON structure.
My first instinct was the usual approach: manually parse the JSON, dig through nested keys, and apply transformations along the way. But it quickly started to feel repetitive and brittle. Every new field meant more extraction logic and more edge cases to handle.
I knew about tools like jq for expression-based JSON querying and Pydantic for schema validation, so I tried combining them. While this helped a bit, I still found myself writing a lot of glue code just to move data from one place to another.
At that point a simple idea occurred to me:
What if I could just declare what I want from the JSON, and let the rest happen automatically?
I looked around for a library that did this out of the box but couldn’t find something that quite fit. So, naturally, I decided to try building one.
Understanding the Problem
Say you have a JSON structure of an Order
{
"id": "order_789",
"created_at": "2024-02-10T14:21:00Z",
"customer": {
"first_name": "Alice",
"last_name": "Smith",
"contact": {
"email": "ALICE@EXAMPLE.COM"
}
},
"items": [
{ "name": "Keyboard", "category": "premium", "price": 120 },
{ "name": "Mouse", "category": "standard", "price": 40 },
{ "name": "Monitor", "category": "premium", "price": 300 }
]
}
Now I want to transform this into an OrderSummary that looks like
{
"order_id": "order_789",
"customer_email": "alice@example.com",
"item_count": 3,
"premium_total": 420,
"order_label": "ORDER-order_789"
}
Naive Approach
To produce the same transformed JSON using a traditional approach, we would typically write something like this:
with open("orders.json") as f:
data = json.loads(f.read())
order_id = data["id"]
customer_email = data["customer"]["contact"]["email"].lower()
items = data.get("items", [])
item_count = len(items)
premium_total = sum(
item["price"]
for item in items
if item.get("category") == "premium"
)
order_label = f"ORDER-{data['id']}"
order_summary = {
"order_id": order_id,
"customer_email": customer_email,
"item_count": item_count,
"premium_total": premium_total,
"order_label": order_label
}
While this works, the extraction logic, transformations, computed fields are all mixed together and the schema is implicit. As the JSON structure grows, this approach quickly becomes harder to maintain.
Declarative Solution
The idea is to declare a model that describes the OrderSummary like
from typing import Annotated
# I named the library as "jresolve"
from jresolve import (
JqModel,
Jq,
JqMode,
Transform,
Computed
)
class OrderSummary(JqModel):
order_id: Annotated[
str,
Jq(".id")
]
customer_email: Annotated[
str,
Jq(".customer.contact.email"),
Transform(str.lower)
]
item_count: Annotated[
int,
Jq(".items"),
Transform(len)
]
premium_total: Annotated[
float,
Jq(
".items[] | select(.category == \"premium\") | .price",
mode=JqMode.MANY
),
Transform(sum)
]
order_label: Annotated[
str,
Computed(lambda d: f"ORDER-{d['id']}")
]
The usage for this would be
order_summary = OrderSummary.from_json(data)
The key idea here is
The model declares how fields are extracted and transformed directly in the type annotation.
High Level Architecture
Input JSON
↓
Resolver (Jq / Computed)
↓
Transform Pipeline
↓
Collected Field Values
↓
Pydantic Model Construction
↓
Typed Output
Diving Deeper
Now that we have seen how the declarative model looks from the outside, let's briefly look at the core ideas that make this work internally.
Analyze a field
customer_email: Annotated[
str,
Jq(".customer.contact.email"),
Transform(str.lower)
]
Just by looking at it I can already tell that:
- The type of the field is
str - Its value is extracted using the
jqexpression".customer.contact.email" - Once I have the value I want to apply a
transformationtolowercase
The intent of the field becomes immediately obvious. And since we are using Pydantic you get schema validation for free.
The backbone Annotated
The Annotated type acts as the glue that binds all the declarations together.
If we examine the type
order_id: Annotated[
str,
Jq(".id"),
Transform(str.upper)
]
It tells us
Type:
→ str
Metadata:
Resolvers:
→ Jq(".id")
Transforms:
→ str.upper
Since Pydantic allows us to access the metadata stored in Annotated, we can interpret those declarations and execute them against the JSON input.
So the field effectively becomes
JSON
↓
Jq Resolver
↓
Transform Pipeline
↓
Typed Field
A clean pipeline to reason and add more operations to.
Resolver
While Annotated does all the heavy-lifting in the interface, the Resolver does so for the core implementation.
It is an Abstract class that provides an interface for different types of Resolvers
class Resolver(ABC):
@abstractmethod
def resolve(self, data: dict) -> Result[Any, ResolutionError]:
...
NOTE: A Rust styled
Resulttype is used to returnresultsorerrorsthroughout the implementation. Take a look at result.
Currently there are 3 implementations of Resolver:
-
Jqto extract values using thejqexpression syntax -
Computedto generate a value using a function -
Pipelineused internally that works on a baseResolverand alist[Transform]
How the Model Is Executed
The current execution model looks like
resolver output (Jq / Computed)
↓
transform 1
↓
transform 2
↓
final value
Which translates to
- Inspect model annotations
- Extract resolvers and transforms from Annotated metadata
- Build a pipeline
- Execute pipeline on input JSON
- Construct the Pydantic model which roughly looks like
for field_name, field in cls.model_fields.items():
resolver = build_pipeline_from_field(field)
if resolver:
result = resolver.resolve(data)
values[field_name] = result.ok()
Nested Models
Real JSON structures are rarely flat. Fortunately, JqModels can be nested and are resolved recursively.
class OrderSummary(JqModel):
# previous fields
customer: Customer
Where Customer can look like
from typing import Annotated
class Customer(JqModel):
name: Annotated[
str,
Jq(".profile.name.last + ', ' + .profile.name.first")
]
email: Annotated[
str,
Jq(".customer.email"),
Transform(str.lower)
]
Error Handling
Resolvers return a Rust-style Result type that either contains a value or a structured error.
Ok(value)
Err(ResolutionError)
The pipeline inspects the result and short-circuits if an error occurs, allowing failures to propagate cleanly without raising exceptions.
Closing Thoughts
This pattern works particularly well when dealing with:
- Complex API responses
- ETL pipelines
- Data normalization layers
- Event payload transformations
Instead of scattering extraction logic across the codebase, transformations become declarative and centralized in the model definition.
Using Annotated types allows us to build a small DSL directly inside Python’s type system, while still benefiting from Pydantic’s validation and typing support.
The result is a system that is:
- Declarative
- Type-safe
- Composable
- Easy to extend
If you’re interested in the full implementation, the code is available in my Github Repo.
I would love to hear your thoughts about this and what I could have done better!
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.