Motivating Example
The other day I was experimenting with an application I wrote about a year ago that relies on a set of rules to process some data. Due to some concerns with ease of use and extensibility, I decided to redesign the rule schema and use Pydantic to parse them instead of Python's builtin dataclass
.
Once the new schema was finished, I ran some performance tests which showed the new schema led to a 70% decrease in application performance. Part of this was due to changes in the structure of the rules, but the other aspect was switching to Pydantic. I decided to do some performance testing to see how much of a difference this change made.
Pydantic Overview
If you work with backend APIs in Python, you've probably used or heard of Pydantic, perhaps from FastAPI. The library advertises itself as "data validation and settings management using Python type annotations" and it makes (de)serialization of data a breeze.1
Consider the following example using Python's builtin dataclass
:
from dataclasses import dataclass
@dataclass
class User:
id: int
name: str
If you pass data to this object's constructor that doesn't match the specified types, Python will still gladly create the object for you:
user = User(**{"id": "ABC", "name": 39})
print(user)
# > User(id='ABC', name=39)
Using Pydantic instead, we will get an error2 if we try the same thing:
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
user = User(**{"id": "ABC", "name": 39})
# > ValidationError: 1 validation error for User
# > id
# > value is not a valid integer (type=type_error.integer)
Hopefully this illustrates one of the many benefits of using Pydantic. I invite you to read the documentation to see the full capabilities of the library beyond this contrived example.
New Performance Claims
Pydantic's capabilities can make your programs more resilient and easier to read and write, but you may sacrifice some performance for these benefits. To address these concerns, the creators of Pydantic endeavored to rewrite the backend "with validation and serialisation logic implemented in Rust" to boost performance by "5-50x" over v1
in the new major version coming soon.3
For those unfamiliar, Pydantic is currently implemented in Python and this rewrite shifts most of the code to Rust, a systems programming language touted as "blazingly fast" and safe.
If you're familiar with Rust then these types of gains may not seem unreasonable, but "5-50x" is still a big claim, especially in the notoriously slow world of Python. I'm a big fan of Rust but I wanted to verify these claims for myself, as well as understanding how they compare to Python's builtin functionality.
Benchmarking
In my original example, my main bottleneck turned out to be in data deserialization - converting a "rule" into a Pydantic object.
To test this, we will setup some benchmarks using pytest-benchmark, some sample data with a simple schema, and compare results between Python's dataclass
, Pydantic v1
, and v2
.
Setup
Here is the test setup which uses a simple model of a user:
from dataclasses import dataclass
import pytest
from pydantic import BaseModel
@dataclass
class UserDC:
id: int
first_name: str
last_name: str
age: int
email: str
class UserPY(BaseModel):
id: int
first_name: str
last_name: str
age: int
email: str
@pytest.mark.benchmarks
def test_dc_bench(test_user, benchmark):
user = benchmark.pedantic(UserDC, kwargs=test_user, iterations=10, rounds=50_000)
assert user.id == 1
@pytest.mark.benchmarks
def test_py_bench(test_user, benchmark):
user = benchmark.pedantic(UserPY, kwargs=test_user, iterations=10, rounds=50_000)
assert user.id == 1
We are using Python's dataclass
as a baseline for comparison since I will need to run these tests with two different versions of Pydantic installed.
All benchmarks are run on a 2021 MacBook Pro with M1 Pro and 32GB RAM with the following environment:
Test session starts (platform: darwin, Python 3.11.3, pytest 7.3.1, pytest-sugar 0.9.7)
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
Benchmark Results
Using Pydantic v1
Results reproduced for visibility:
Name (time in ns) | Min | Max | Mean | StdDev | Median | IQR | Outliers | OPS (Kops/s) | Rounds | Iterations |
---|---|---|---|---|---|---|---|---|---|---|
test_dc_parse_bench | 166.5991 (1.0) | 1,791.5998 (1.0) | 178.0631 (1.0) | 19.0275 (1.0) | 179.0992 (1.0) | 4.1997 (1.0) | 210;1391 | 5,615.9867 (1.0) | 50000 | 10 |
test_py_parse_bench | 3,170.7998 (19.03) | 9,162.4999 (5.11) | 3,227.0666 (18.12) | 69.3922 (3.65) | 3,216.6994 (17.96) | 20.7001 (4.93) | 2192;2766 | 309.8789 (0.06) | 50000 | 10 |
Using Pydantic v2
Reproduced for visibility:
Name (time in ns) | Min | Max | Mean | StdDev | Median | IQR | Outliers | OPS (Mops/s) | Rounds | Iterations |
---|---|---|---|---|---|---|---|---|---|---|
test_dc_parse_bench | 170.8002 (1.0) | 1,754.0995 (1.0) | 183.2414 (1.0) | 14.7975 (1.0) | 183.3003 (1.0) | 4.2011 (1.0) | 542;2882 | 5.4573 (1.0) | 50000 | 10 |
test_py_parse_bench | 741.6995 (4.34) | 4,591.6997 (2.62) | 768.6778 (4.19) | 21.8820 (1.48) | 766.6997 (4.18) | 8.3994 (2.00) | 2424;3224 | 1.3009 (0.24) | 50000 | 10 |
Key Takeaways
- Both Pydantic
v1
andv2
perform significantly slower thandataclass
- Pydantic's performance varies more widely than
dataclass
- Pydantic
v2
performs significantly faster thanv1
- Pydantic
v2
's performance varies less thanv1
Analysis
The results here confirm my suspicion that switching from dataclass
to Pydantic was a significant factor in the performance degradation that sparked this investigation. We can see that, on average, Pydantic v1
is about 18x slower than dataclass
.
We can also see that Pydantic v2
is, on average, about 4x faster than v1
for this particular use-case. If you check another post from Pydantic, you may see the range "4-50x" instead of the aforementioned "5-50x", which technically means these results meet their claims, even if just barely. I won't split hairs over it since this example is incredibly simple and not the likely target of optimization.
What's important to note here is even "just" 4x performance improvements can be a major win, especially if these gains can be achieved with little to no changes.4 For me, this is an easy win to recoup some of the performance losses I was facing in my initial example.
I encourage you to checkout the official benchmarks for more realistic and detailed examples, and, as always, YMMV.
Conclusions
Will Pydantic's new major release live up to the hype? In most cases you will probably see improvements on the lower bound of their estimations, but, as mentioned, even this can be a big win. In my opinion, Pydantic brings a number of enhancements to Python applications that more than make up for any of its performance losses.
It's worth noting that these improvements will also impact other libraries and frameworks that rely on Pydantic, such as FastAPI and AWS Lambda Powertools, which could deliver some transitive performance improvements to various projects that don't directly depend on Pydantic themselves.
Top comments (1)
Interesting, did you try with a more complex model? E.g. with compound types? You should see a bigger improvement with that.