DEV Community

Cover image for Benchmarking python JSON libraries
Kanak Tanwar
Kanak Tanwar

Posted on

Benchmarking python JSON libraries

While reading the FastAPI documentation I came across various python libraries for working with json. This got me thinking which one is the best, performance wise (including the python's built-in json module).

I surfed the net, found some resources and comparisions, and thought of implementing them on my own. So, this article is about the perf benchmarks I got when testing these libraries and the methodology.


Test Setup

  • Ubuntu 24.04 LTS
  • RAM - 24 GB
  • Python version - 3.12

Libraries tested

  • json - Built-in Python JSON library; widely used but relatively slow.
  • ujson - Fast C-based JSON parser; a drop-in replacement for json.
  • orjson - Extremely fast Rust-based library with rich type support.
  • rapidjson - Python wrapper for RapidJSON (C++); good performance and flexibility.
  • msgspec - Ultra-fast library with optional typed structs for maximum speed.

Results

Library Serialization Time (s) Deserialization Time (s)
json 1.616786 1.616203
ujson 1.413367 1.853332
orjson 0.417962 1.272813
rapidjson 2.044958 1.717067
msgspec 0.489964 0.930834

Bargraph for perf benchmarks

Takeways

  • The top contendors are orjson and msgspec (duh).
  • I personally like to use orjson when working with fastAPI as it has builtin support for orjson response format making it a more developer friendly option.
  • msgspec on the other hand is like a swiss army knife as it has support for other structs as well like yaml, toml etc. And it has a ton of more validation features like validating with a python class (sort of like pydantic).

Methodology

Generating sample data

A simple script to generate the testing data (can be made complex for better benchmarking).

def generate_sample_data(n):
    return [
        {
            "id": i,
            "name": f"User{i}",
            "active": bool(i % 2),
            "scores": [random.random() for _ in range(10)],
            "info": {"age": random.randint(20, 40), "city": "City" + str(i)}
        }
        for i in range(n)
    ]

data = generate_sample_data(10_000)
Enter fullscreen mode Exit fullscreen mode

Benchmarking function

Store the encoding and decoding methods of every library in a dictionary with the lib names as the key and run this function.

def benchmark_json_libs(data, runs=10):
    results = []

    # Pre-serialize data for each lib
    pre_serialized = {}
    for name, funcs in libs.items():
        try:
            pre_serialized[name] = funcs["dumps"](data)
        except Exception as e:
            print(f"Serialization failed for {name}: {e}")
            continue

    for name, funcs in libs.items():
        if name not in pre_serialized:
            continue

        try:
            ser_time = timeit.timeit(lambda: funcs["dumps"](data), number=runs)
            deser_time = timeit.timeit(lambda: funcs["loads"](pre_serialized[name]), number=runs)
            results.append({
                "Library": name,
                "Serialization Time (s)": ser_time,
                "Deserialization Time (s)": deser_time
            })
        except Exception as e:
            print(f"Benchmarking failed for {name}: {e}")

    return pd.DataFrame(results)
Enter fullscreen mode Exit fullscreen mode

Visualization

Visualize the result using seaborn and matplotlib.

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

sns.barplot(x="Library", y="Serialization Time (s)", data=results_df, ax=axes[0])
axes[0].set_title("JSON Serialization Time")

sns.barplot(x="Library", y="Deserialization Time (s)", data=results_df, ax=axes[1])
axes[1].set_title("JSON Deserialization Time")

plt.tight_layout()
plt.show()
Enter fullscreen mode Exit fullscreen mode

And that's it. Thanks for reading. Bye.

Top comments (1)

Collapse
 
aarushi_bhatia_578922f673 profile image
Aarushi Bhatia

Good one 🙌