DEV Community

Alex Spinov
Alex Spinov

Posted on

The Fastest Way to Parse JSON in Python (Benchmark of 5 Libraries)

I needed to parse 2GB of JSON logs per day. Python's built-in json module was too slow. So I benchmarked every JSON library I could find.

The Contenders

  1. json — Python stdlib
  2. ujson — C extension, drop-in replacement
  3. orjson — Rust-based, fastest option
  4. rapidjson — C++ wrapper
  5. simdjson — SIMD-accelerated parsing

The Benchmark

import json
import time

# Generate test data: 100K JSON objects
data = [{"id": i, "name": f"user_{i}", "email": f"user_{i}@test.com", 
         "scores": [i*0.1, i*0.2, i*0.3], "active": i % 2 == 0} 
        for i in range(100_000)]
json_str = json.dumps(data)

def bench(name, parse_fn, dumps_fn, iterations=10):
    # Parse benchmark
    start = time.perf_counter()
    for _ in range(iterations):
        parse_fn(json_str)
    parse_time = (time.perf_counter() - start) / iterations

    # Serialize benchmark
    start = time.perf_counter()
    for _ in range(iterations):
        dumps_fn(data)
    dumps_time = (time.perf_counter() - start) / iterations

    print(f"{name:12} | parse: {parse_time*1000:6.1f}ms | dumps: {dumps_time*1000:6.1f}ms")
Enter fullscreen mode Exit fullscreen mode

Results (M1 MacBook Air, Python 3.12)

Library Parse (ms) Serialize (ms) Speedup vs stdlib
json 142 98 1.0x
ujson 89 45 1.6x / 2.2x
rapidjson 78 52 1.8x / 1.9x
orjson 31 18 4.6x / 5.4x
simdjson 28 N/A (parse only) 5.1x

orjson wins for most use cases. simdjson is slightly faster for parsing, but it's parse-only — you can't serialize with it.

My Recommendation

# Just change one import — orjson is a drop-in replacement
import orjson

# Parse
data = orjson.loads(json_bytes)

# Serialize (returns bytes, not str)
json_bytes = orjson.dumps(data)

# With options
json_bytes = orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)
Enter fullscreen mode Exit fullscreen mode

Install: pip install orjson

The Gotchas

orjson returns bytes, not str. If you need a string:

json_str = orjson.dumps(data).decode('utf-8')
Enter fullscreen mode Exit fullscreen mode

orjson doesn't support default parameter the same way. For custom objects:

def default(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError

orjson.dumps(data, default=default)
Enter fullscreen mode Exit fullscreen mode

simdjson requires padding. The input must have extra bytes at the end — the library handles this automatically with simdjson.Parser(), but it's worth knowing.

When to NOT Switch

  • Small JSON (<1KB): stdlib is fine, the overhead of importing orjson isn't worth it
  • No C extensions allowed: some environments restrict compiled packages
  • Need 100% stdlib compatibility: orjson's dumps() returns bytes, which might break existing code

What Do You Use?

Are you still using the stdlib json module? Have you tried orjson? Any other libraries I should benchmark?


I benchmark developer tools and build security scanners. More benchmarks coming soon.

Top comments (0)