The Fastest Way to Parse JSON in Python (Benchmark of 5 Libraries)

#discuss #performance #python #programming

I needed to parse 2GB of JSON logs per day. Python's built-in json module was too slow. So I benchmarked every JSON library I could find.

The Contenders

json — Python stdlib
ujson — C extension, drop-in replacement
orjson — Rust-based, fastest option
rapidjson — C++ wrapper
simdjson — SIMD-accelerated parsing

The Benchmark

import json
import time

# Generate test data: 100K JSON objects
data = [{"id": i, "name": f"user_{i}", "email": f"user_{i}@test.com", 
         "scores": [i*0.1, i*0.2, i*0.3], "active": i % 2 == 0} 
        for i in range(100_000)]
json_str = json.dumps(data)

def bench(name, parse_fn, dumps_fn, iterations=10):
    # Parse benchmark
    start = time.perf_counter()
    for _ in range(iterations):
        parse_fn(json_str)
    parse_time = (time.perf_counter() - start) / iterations

    # Serialize benchmark
    start = time.perf_counter()
    for _ in range(iterations):
        dumps_fn(data)
    dumps_time = (time.perf_counter() - start) / iterations

    print(f"{name:12} | parse: {parse_time*1000:6.1f}ms | dumps: {dumps_time*1000:6.1f}ms")

Results (M1 MacBook Air, Python 3.12)

Library	Parse (ms)	Serialize (ms)	Speedup vs stdlib
json	142	98	1.0x
ujson	89	45	1.6x / 2.2x
rapidjson	78	52	1.8x / 1.9x
orjson	31	18	4.6x / 5.4x
simdjson	28	N/A (parse only)	5.1x

orjson wins for most use cases. simdjson is slightly faster for parsing, but it's parse-only — you can't serialize with it.

My Recommendation

# Just change one import — orjson is a drop-in replacement
import orjson

# Parse
data = orjson.loads(json_bytes)

# Serialize (returns bytes, not str)
json_bytes = orjson.dumps(data)

# With options
json_bytes = orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)

Install: pip install orjson

The Gotchas

orjson returns bytes, not str. If you need a string:

json_str = orjson.dumps(data).decode('utf-8')

orjson doesn't support default parameter the same way. For custom objects:

def default(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError

orjson.dumps(data, default=default)

simdjson requires padding. The input must have extra bytes at the end — the library handles this automatically with simdjson.Parser(), but it's worth knowing.

When to NOT Switch

Small JSON (<1KB): stdlib is fine, the overhead of importing orjson isn't worth it
No C extensions allowed: some environments restrict compiled packages
Need 100% stdlib compatibility: orjson's dumps() returns bytes, which might break existing code

What Do You Use?

Are you still using the stdlib json module? Have you tried orjson? Any other libraries I should benchmark?

I benchmark developer tools and build security scanners. More benchmarks coming soon.

DEV Community