Investigating Pydantic v2's Bold Performance Claims

Donovan — Wed, 17 May 2023 18:51:49 +0000

Motivating Example

The other day I was experimenting with an application I wrote about a year ago that relies on a set of rules to process some data. Due to some concerns with ease of use and extensibility, I decided to redesign the rule schema and use Pydantic to parse them instead of Python's builtin dataclass.

Once the new schema was finished, I ran some performance tests which showed the new schema led to a 70% decrease in application performance. Part of this was due to changes in the structure of the rules, but the other aspect was switching to Pydantic. I decided to do some performance testing to see how much of a difference this change made.

Pydantic Overview

If you work with backend APIs in Python, you've probably used or heard of Pydantic, perhaps from FastAPI. The library advertises itself as "data validation and settings management using Python type annotations" and it makes (de)serialization of data a breeze.¹

Consider the following example using Python's builtin dataclass:

from dataclasses import dataclass

@dataclass 
class User:
    id: int
    name: str

If you pass data to this object's constructor that doesn't match the specified types, Python will still gladly create the object for you:

user = User(**{"id": "ABC", "name": 39})
print(user)
# > User(id='ABC', name=39)

Using Pydantic instead, we will get an error² if we try the same thing:

from pydantic import BaseModel

class User(BaseModel):
    id: int
    name: str

user = User(**{"id": "ABC", "name": 39})
# > ValidationError: 1 validation error for User
# > id
# >   value is not a valid integer (type=type_error.integer)

Hopefully this illustrates one of the many benefits of using Pydantic. I invite you to read the documentation to see the full capabilities of the library beyond this contrived example.

New Performance Claims

Pydantic's capabilities can make your programs more resilient and easier to read and write, but you may sacrifice some performance for these benefits. To address these concerns, the creators of Pydantic endeavored to rewrite the backend "with validation and serialisation logic implemented in Rust" to boost performance by "5-50x" over v1 in the new major version coming soon.³

For those unfamiliar, Pydantic is currently implemented in Python and this rewrite shifts most of the code to Rust, a systems programming language touted as "blazingly fast" and safe.

If you're familiar with Rust then these types of gains may not seem unreasonable, but "5-50x" is still a big claim, especially in the notoriously slow world of Python. I'm a big fan of Rust but I wanted to verify these claims for myself, as well as understanding how they compare to Python's builtin functionality.

Benchmarking

In my original example, my main bottleneck turned out to be in data deserialization - converting a "rule" into a Pydantic object.

To test this, we will setup some benchmarks using pytest-benchmark, some sample data with a simple schema, and compare results between Python's dataclass, Pydantic v1, and v2.

Setup

Here is the test setup which uses a simple model of a user:

from dataclasses import dataclass

import pytest
from pydantic import BaseModel


@dataclass
class UserDC:
    id: int
    first_name: str
    last_name: str
    age: int
    email: str


class UserPY(BaseModel):
    id: int
    first_name: str
    last_name: str
    age: int
    email: str


@pytest.mark.benchmarks
def test_dc_bench(test_user, benchmark):
    user = benchmark.pedantic(UserDC, kwargs=test_user, iterations=10, rounds=50_000)
    assert user.id == 1


@pytest.mark.benchmarks
def test_py_bench(test_user, benchmark):
    user = benchmark.pedantic(UserPY, kwargs=test_user, iterations=10, rounds=50_000)
    assert user.id == 1

We are using Python's dataclass as a baseline for comparison since I will need to run these tests with two different versions of Pydantic installed.

All benchmarks are run on a 2021 MacBook Pro with M1 Pro and 32GB RAM with the following environment:

Test session starts (platform: darwin, Python 3.11.3, pytest 7.3.1, pytest-sugar 0.9.7)
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)

Benchmark Results

Using Pydantic `v1`

Results reproduced for visibility:

Name (time in ns)	Min	Max	Mean	StdDev	Median	IQR	Outliers	OPS (Kops/s)	Rounds	Iterations
test_dc_parse_bench	166.5991 (1.0)	1,791.5998 (1.0)	178.0631 (1.0)	19.0275 (1.0)	179.0992 (1.0)	4.1997 (1.0)	210;1391	5,615.9867 (1.0)	50000	10
test_py_parse_bench	3,170.7998 (19.03)	9,162.4999 (5.11)	3,227.0666 (18.12)	69.3922 (3.65)	3,216.6994 (17.96)	20.7001 (4.93)	2192;2766	309.8789 (0.06)	50000	10

Using Pydantic `v2`

Reproduced for visibility:

Name (time in ns)	Min	Max	Mean	StdDev	Median	IQR	Outliers	OPS (Mops/s)	Rounds	Iterations
test_dc_parse_bench	170.8002 (1.0)	1,754.0995 (1.0)	183.2414 (1.0)	14.7975 (1.0)	183.3003 (1.0)	4.2011 (1.0)	542;2882	5.4573 (1.0)	50000	10
test_py_parse_bench	741.6995 (4.34)	4,591.6997 (2.62)	768.6778 (4.19)	21.8820 (1.48)	766.6997 (4.18)	8.3994 (2.00)	2424;3224	1.3009 (0.24)	50000	10

Key Takeaways

Both Pydantic v1 and v2 perform significantly slower than dataclass
Pydantic's performance varies more widely than dataclass
Pydantic v2 performs significantly faster than v1
Pydantic v2's performance varies less than v1

Analysis

The results here confirm my suspicion that switching from dataclass to Pydantic was a significant factor in the performance degradation that sparked this investigation. We can see that, on average, Pydantic v1 is about 18x slower than dataclass.

We can also see that Pydantic v2 is, on average, about 4x faster than v1 for this particular use-case. If you check another post from Pydantic, you may see the range "4-50x" instead of the aforementioned "5-50x", which technically means these results meet their claims, even if just barely. I won't split hairs over it since this example is incredibly simple and not the likely target of optimization.

What's important to note here is even "just" 4x performance improvements can be a major win, especially if these gains can be achieved with little to no changes.⁴ For me, this is an easy win to recoup some of the performance losses I was facing in my initial example.

I encourage you to checkout the official benchmarks for more realistic and detailed examples, and, as always, YMMV.

Conclusions

Will Pydantic's new major release live up to the hype? In most cases you will probably see improvements on the lower bound of their estimations, but, as mentioned, even this can be a big win. In my opinion, Pydantic brings a number of enhancements to Python applications that more than make up for any of its performance losses.

It's worth noting that these improvements will also impact other libraries and frameworks that rely on Pydantic, such as FastAPI and AWS Lambda Powertools, which could deliver some transitive performance improvements to various projects that don't directly depend on Pydantic themselves.

Footnotes

[1]

[2] This example will actually coerce `39` into a string using Pydantic `v1` if you resolve the type error on `id`. Using Pydantic `v2` will instead report a validation error for both fields.

DEV Community: Donovan

Investigating Pydantic v2's Bold Performance Claims

Motivating Example

Pydantic Overview

New Performance Claims

Benchmarking

Setup

Benchmark Results

Using Pydantic `v1`

Using Pydantic `v2`

Key Takeaways

Analysis

Conclusions

Footnotes

[1]

[2] This example will actually coerce `39` into a string using Pydantic `v1` if you resolve the type error on `id`. Using Pydantic `v2` will instead report a validation error for both fields.

[3]

[4] See the migration guide for specifics on upgrading

DEV Community: Donovan

Investigating Pydantic v2's Bold Performance Claims

Motivating Example

Pydantic Overview

New Performance Claims

Benchmarking

Setup

Benchmark Results

Using Pydantic v1

Using Pydantic v2

Key Takeaways

Analysis

Conclusions

Footnotes

[1]

[2] This example will actually coerce 39 into a string using Pydantic v1 if you resolve the type error on id. Using Pydantic v2 will instead report a validation error for both fields.

[3]

[4] See the migration guide for specifics on upgrading

Using Pydantic `v1`

Using Pydantic `v2`

[2] This example will actually coerce `39` into a string using Pydantic `v1` if you resolve the type error on `id`. Using Pydantic `v2` will instead report a validation error for both fields.