Benchmarking python JSON libraries

#python #json #programming

While reading the FastAPI documentation I came across various python libraries for working with json. This got me thinking which one is the best, performance wise (including the python's built-in json module).

I surfed the net, found some resources and comparisions, and thought of implementing them on my own. So, this article is about the perf benchmarks I got when testing these libraries and the methodology.

Test Setup

Ubuntu 24.04 LTS
RAM - 24 GB
Python version - 3.12

Libraries tested

json - Built-in Python JSON library; widely used but relatively slow.
ujson - Fast C-based JSON parser; a drop-in replacement for json.
orjson - Extremely fast Rust-based library with rich type support.
rapidjson - Python wrapper for RapidJSON (C++); good performance and flexibility.
msgspec - Ultra-fast library with optional typed structs for maximum speed.

Results

Library	Serialization Time (s)	Deserialization Time (s)
json	1.616786	1.616203
ujson	1.413367	1.853332
orjson	0.417962	1.272813
rapidjson	2.044958	1.717067
msgspec	0.489964	0.930834

Takeways

The top contendors are orjson and msgspec (duh).

I personally like to use orjson when working with fastAPI as it has builtin support for orjson response format making it a more developer friendly option.

msgspec on the other hand is like a swiss army knife as it has support for other structs as well like yaml, toml etc. And it has a ton of more validation features like validating with a python class (sort of like pydantic).

Methodology

Generating sample data

A simple script to generate the testing data (can be made complex for better benchmarking).

def generate_sample_data(n):
    return [
        {
            "id": i,
            "name": f"User{i}",
            "active": bool(i % 2),
            "scores": [random.random() for _ in range(10)],
            "info": {"age": random.randint(20, 40), "city": "City" + str(i)}
        }
        for i in range(n)
    ]

data = generate_sample_data(10_000)

Benchmarking function

Store the encoding and decoding methods of every library in a dictionary with the lib names as the key and run this function.

def benchmark_json_libs(data, runs=10):
    results = []

    # Pre-serialize data for each lib
    pre_serialized = {}
    for name, funcs in libs.items():
        try:
            pre_serialized[name] = funcs["dumps"](data)
        except Exception as e:
            print(f"Serialization failed for {name}: {e}")
            continue

    for name, funcs in libs.items():
        if name not in pre_serialized:
            continue

        try:
            ser_time = timeit.timeit(lambda: funcs["dumps"](data), number=runs)
            deser_time = timeit.timeit(lambda: funcs["loads"](pre_serialized[name]), number=runs)
            results.append({
                "Library": name,
                "Serialization Time (s)": ser_time,
                "Deserialization Time (s)": deser_time
            })
        except Exception as e:
            print(f"Benchmarking failed for {name}: {e}")

    return pd.DataFrame(results)

Visualization

Visualize the result using seaborn and matplotlib.

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

sns.barplot(x="Library", y="Serialization Time (s)", data=results_df, ax=axes[0])
axes[0].set_title("JSON Serialization Time")

sns.barplot(x="Library", y="Deserialization Time (s)", data=results_df, ax=axes[1])
axes[1].set_title("JSON Deserialization Time")

plt.tight_layout()
plt.show()

And that's it. Thanks for reading. Bye.

Top comments (1)

Aarushi Bhatia • Jul 24

Good one 🙌