I needed to parse 2GB of JSON logs per day. Python's built-in json module was too slow. So I benchmarked every JSON library I could find.
The Contenders
- json — Python stdlib
- ujson — C extension, drop-in replacement
- orjson — Rust-based, fastest option
- rapidjson — C++ wrapper
- simdjson — SIMD-accelerated parsing
The Benchmark
import json
import time
# Generate test data: 100K JSON objects
data = [{"id": i, "name": f"user_{i}", "email": f"user_{i}@test.com",
"scores": [i*0.1, i*0.2, i*0.3], "active": i % 2 == 0}
for i in range(100_000)]
json_str = json.dumps(data)
def bench(name, parse_fn, dumps_fn, iterations=10):
# Parse benchmark
start = time.perf_counter()
for _ in range(iterations):
parse_fn(json_str)
parse_time = (time.perf_counter() - start) / iterations
# Serialize benchmark
start = time.perf_counter()
for _ in range(iterations):
dumps_fn(data)
dumps_time = (time.perf_counter() - start) / iterations
print(f"{name:12} | parse: {parse_time*1000:6.1f}ms | dumps: {dumps_time*1000:6.1f}ms")
Results (M1 MacBook Air, Python 3.12)
| Library | Parse (ms) | Serialize (ms) | Speedup vs stdlib |
|---|---|---|---|
| json | 142 | 98 | 1.0x |
| ujson | 89 | 45 | 1.6x / 2.2x |
| rapidjson | 78 | 52 | 1.8x / 1.9x |
| orjson | 31 | 18 | 4.6x / 5.4x |
| simdjson | 28 | N/A (parse only) | 5.1x |
orjson wins for most use cases. simdjson is slightly faster for parsing, but it's parse-only — you can't serialize with it.
My Recommendation
# Just change one import — orjson is a drop-in replacement
import orjson
# Parse
data = orjson.loads(json_bytes)
# Serialize (returns bytes, not str)
json_bytes = orjson.dumps(data)
# With options
json_bytes = orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)
Install: pip install orjson
The Gotchas
orjson returns bytes, not str. If you need a string:
json_str = orjson.dumps(data).decode('utf-8')
orjson doesn't support default parameter the same way. For custom objects:
def default(obj):
if isinstance(obj, datetime):
return obj.isoformat()
raise TypeError
orjson.dumps(data, default=default)
simdjson requires padding. The input must have extra bytes at the end — the library handles this automatically with simdjson.Parser(), but it's worth knowing.
When to NOT Switch
- Small JSON (<1KB): stdlib is fine, the overhead of importing orjson isn't worth it
- No C extensions allowed: some environments restrict compiled packages
-
Need 100% stdlib compatibility: orjson's
dumps()returnsbytes, which might break existing code
What Do You Use?
Are you still using the stdlib json module? Have you tried orjson? Any other libraries I should benchmark?
I benchmark developer tools and build security scanners. More benchmarks coming soon.
Top comments (0)