While reading the FastAPI documentation I came across various python libraries for working with json
. This got me thinking which one is the best, performance wise (including the python's built-in json module).
I surfed the net, found some resources and comparisions, and thought of implementing them on my own. So, this article is about the perf benchmarks I got when testing these libraries and the methodology.
Test Setup
- Ubuntu 24.04 LTS
- RAM - 24 GB
- Python version - 3.12
Libraries tested
-
json
- Built-in Python JSON library; widely used but relatively slow. -
ujson
- Fast C-based JSON parser; a drop-in replacement for json. -
orjson
- Extremely fast Rust-based library with rich type support. -
rapidjson
- Python wrapper for RapidJSON (C++); good performance and flexibility. -
msgspec
- Ultra-fast library with optional typed structs for maximum speed.
Results
Library | Serialization Time (s) | Deserialization Time (s) |
---|---|---|
json | 1.616786 | 1.616203 |
ujson | 1.413367 | 1.853332 |
orjson | 0.417962 | 1.272813 |
rapidjson | 2.044958 | 1.717067 |
msgspec | 0.489964 | 0.930834 |
Takeways
- The top contendors are
orjson
andmsgspec
(duh).- I personally like to use orjson when working with fastAPI as it has builtin support for orjson response format making it a more developer friendly option.
- msgspec on the other hand is like a swiss army knife as it has support for other structs as well like
yaml
,toml
etc. And it has a ton of more validation features like validating with a python class (sort of like pydantic).
Methodology
Generating sample data
A simple script to generate the testing data (can be made complex for better benchmarking).
def generate_sample_data(n):
return [
{
"id": i,
"name": f"User{i}",
"active": bool(i % 2),
"scores": [random.random() for _ in range(10)],
"info": {"age": random.randint(20, 40), "city": "City" + str(i)}
}
for i in range(n)
]
data = generate_sample_data(10_000)
Benchmarking function
Store the encoding and decoding methods of every library in a dictionary with the lib names as the key and run this function.
def benchmark_json_libs(data, runs=10):
results = []
# Pre-serialize data for each lib
pre_serialized = {}
for name, funcs in libs.items():
try:
pre_serialized[name] = funcs["dumps"](data)
except Exception as e:
print(f"Serialization failed for {name}: {e}")
continue
for name, funcs in libs.items():
if name not in pre_serialized:
continue
try:
ser_time = timeit.timeit(lambda: funcs["dumps"](data), number=runs)
deser_time = timeit.timeit(lambda: funcs["loads"](pre_serialized[name]), number=runs)
results.append({
"Library": name,
"Serialization Time (s)": ser_time,
"Deserialization Time (s)": deser_time
})
except Exception as e:
print(f"Benchmarking failed for {name}: {e}")
return pd.DataFrame(results)
Visualization
Visualize the result using seaborn and matplotlib.
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
sns.barplot(x="Library", y="Serialization Time (s)", data=results_df, ax=axes[0])
axes[0].set_title("JSON Serialization Time")
sns.barplot(x="Library", y="Deserialization Time (s)", data=results_df, ax=axes[1])
axes[1].set_title("JSON Deserialization Time")
plt.tight_layout()
plt.show()
And that's it. Thanks for reading. Bye.
Top comments (1)
Good one 🙌